Federal agencies are generating, collecting, and moving vast amounts of data into cloud infrastructures. But how can agency data analysts optimize this data to make it useful and shareable with disparate agencies–especially if their missions complement one another–as well as the general public?
Data sharing between agencies enhances the government’s effectiveness. The Environmental Protection Agency has multiple data-sharing agreements with other government entities, such as the departments of Defense, Energy, and Transportation, while U.S Geological Survey water data might benefit analysts at the Department of Agriculture or the Forest Service.
According to Tommy La, managing director and analytics and data management strategy leader with Accenture Federal, agency tech managers should first make sure they have gone through the proper Authority to Operate (ATO) to secure all of their data before putting it in the cloud. Once in the cloud, they must focus on data governance, data tagging, and aggregation as well as implement the proper disclosure avoidance methods to ensure integrity and confidentiality.
“Before data makes its way into the cloud, there are rigorous controls and security that must be put into place. This is the general process the government goes through to ensure that their applications have the authority to operate,” La said.
The ATO basically verifies that the agency has met the controls and security authorization requirements necessary to have a data footprint in the cloud. Ultimately, agency managers want to ensure other government entities and the public that they are being proper stewards of citizen data and other information they collect.
Next, managers must focus on the governance of the data. At this stage, organizations need to ensure that data is properly tagged because tagging provides an agency with the business metadata necessary to find it later.
Organizations are also using tagging to secure data, in order to implement role-based access control. For instance, a health facility might want to restrict access to patient healthcare records to doctors in the northeast region or from a particular entity, but there might be another administrator who supports the entire nation and has access to everything. “It is through that tagging and the government structure, and through the standards put in place, that allows you to have that role-based access of the data,” La said.
After data governance processes and security policies are in place, data analysts can aggregate the data. They must determine whether the information will be exposed to researchers as raw data or whether it is being provided to executives at an aggregate level. This will help analysts aggregate data appropriately for public use–whether it is raw data or aggregated data, La explained.
After the data is manipulated, the last step is to implement disclosure avoidance methods, which ensures there is no way of inadvertently allowing other users to identify who the data is tied to. “This process scrambles the data to ensure we are not releasing any personally identifiable information. It is all about ensuring the data is secured, but also taking a step beyond that to ensure that the data being released can’t be reverse-engineered [so someone] is able to identify the citizen’s data,” La said.
Reengineering data is a huge threat that makes Federal agencies want to clamp down and not release any data, La said. However, with the OPEN Government Data Act signed into law in January 2019, agency managers will have to think hard about data protection. The Open, Public, Electronic, and Necessary (OPEN) Government Data Act requires that all non-sensitive government data be made available in machine-readable formats by default and establishes Chief Data Officers (CDO) at Federal agencies.
Combination of Tools
As to be expected, a wide range of tools and processes are needed to make data more useful and shareable. “When you think about analytics and data sharing, it is all about what is my core enterprise data strategy?” La said. Consequently, agency managers will need to figure out what they are trying to achieve. What are the use cases? What are the processes behind governance and what data do they want to release? Knowing the answers to these questions will determine how data analysts will engineer the data, as well as build and consolidate data sets to either release it, share it, or display reports or visualization, La explained.
The good news is that many cloud providers including Amazon Web Services, Google, and Microsoft are offering toolsets that help users through the whole life cycle of optimizing and moving data.
“The tool sets we are starting to find within the cloud, [provided by] cloud providers are getting a lot more mature,” La said. “In the past, you would have had to combine a lot of traditional Extract, Transfer, and Load (ETL) toolsets.” But now, cloud providers are offering mature toolsets with soup-to-nuts capability providing databases, platforms, and a means to move datasets around.