A senior National Institutes of Health (NIH) technology official talked about the agency’s strategy to leverage cloud services to further its data science strategic plan during a September 15 Federal News Network event.
Susan Gregurick, NIH’s Associate Director for Data Science and Director of the Office of Data Strategy, discussed the strategic plan’s five pillars, and how the agency is utilizing the cloud to ensure that plan is executed effectively. She also talked about how her office is engaging new technologies like artificial intelligence and cloud services to provide data quickly, cheaply, and securely.
Gregurick referred to the strategic plan – which was implemented in 2018 – as a “living North Star” for NIH. The plan’s five pillars focus on:
- Data infrastructure;
- Creating a “fair and care” data ecosystem – enhancing fair and findable data and taking care of it;
- Leveraging technologies in other fields or agencies;
- Building a strong and diverse data science workforce; and
- Sustaining data policies.
The second pillar – creating a “fair and care” data ecosystem – was one of the main topics during the conversation. Gregurick spoke passionately about providing researchers easy and affordable access to important data – like COVID-19 infections, for example. Leveraging new technologies, she said, is the best way to do this.
Currently, it takes a minimum of two weeks for researchers to gain access to data after they request it. But, according to Gregurick, NIH will be linking its data platforms to the cloud at the end of the current fiscal year to give researchers real time access to the information.
“At least within a 24-hour turnaround, and I think we can do it,” she said of the hoped-for access window. “Making that near to real time data access is a priority and leveraging the cloud … infrastructure is another way to do that.”
Gregurick said he believes the future of data science is a “data mesh” of interconnections between NIH and other Federal agencies, and that means breaking down data silos.
“[The cloud] has not only made it possible for us to talk about data, but for us to really realistically share data as well to enable collaborations,” Gregurick said. “That’s something we’re pushing forward aggressively this year.”
Breaking down these siloed data sets will allow NIH to build the platforms necessary for specific missions, but also connect to larger systems in a very economic and facile way, she said.
The NIH currently has 173 petabytes of data across three cloud platforms, Gregurick explained. By leveraging the cloud, NIH was able to save $10 million in data storage, she said.
“The other benefit of being in the cloud is just the scale in which analytics can run,” Gregurick said. “There is a lot of ways in which you can figure your analytics in the cloud. You can optimize for different things, and you can really do some high-powered analytics at a really reasonable rate.”
She continued, “I know there’s a lack of understanding about how to use the cloud. But I think once you explore it, you’ll find that in some cases this is a really very, very powerful tool.”