Overcoming Dark Data Roadblocks in Agency Cloud Migration

cloud computing concept -min

As Federal agencies continue to amass vast amounts of data, it’s inevitable that some portion of it won’t hold a ton of value to an agency’s mission. It could be a simple email to schedule your next meeting, or it could be notes scribbled on a Word document and stashed on an agency server. Outside of its immediate use at that point in time, there’s not a strong need to store it in perpetuity.

As agencies move their IT systems into the cloud, the problem of extraneous data becomes an even larger issue. Everything cannot, and frankly should not, be coming along during that migration. Today at the American Council for Technology and Industry Advisory Council’s Imagine Nation ELC Conference, Federal leaders discussed how agencies might better determine the value of their data as they prepare to move it.

“You don’t want to take everything to the cloud, especially if you’ve got a bunch of emails that say, ‘Hey, are we going to lunch?’ You don’t want those. You don’t need them. They’re not of any value,” said Torrin Cummings, senior manager for Intranet Portal Solutions at the Internal Revenue Service.

The problem Cummings is referring to is that of dark data, or data created through various network operations, but which don’t have an immediate impact to the organization’s decision making. They don’t add value to business intelligence, analytics, or help the organization gain insight, and much is generated merely for compliance purposes.

Cummings said the IRS is beginning to use tools to shine a light on dark data. “We are actually looking at all of our old drives, all of those shared drives,” he said. “If it’s a record, so it can be dispositioned correctly. If it’s something that’s no longer needed, we need to make sure this is disposed of correctly.”

While dark data may not help agencies gain insight, it’s easy to see how its potential misuse could still present a host of security ramifications. Dark data can comprise personally identifiable information or proprietary information that agencies wouldn’t want in the wrong hands.

The sheer volume of data, and the time to sift through it, makes this a problem that may well seem “insurmountable,” Cummings said. “Finding the tools, finding the time, finding the resources becomes key,” he added.

“Think through how to start talking about the value of your data as you start applying limited resources in a fiscally-constrained environment,” said Donna Roy, executive director of the Information Sharing and Services Office at the Department of Homeland Security. “What is the value of the data that you need to worry about most, and that you need to spend your daylight on, your funding on, your talent on, so that you’re putting yourself in a better position,” she asked.

Others expressed how good data governance becomes essential, and might simply be aided by thinking twice about standard practice when it comes to cataloguing and saving files.

“Try to keep your data very vertical and not horizontal,” said Shane Perry, IT Specialist at the Centers for Medicaid and Medicare Services. “The deeper you get, the harder it’s going to be to find it. So keep it at a high level, tag it twice and call it a day, two folders tops. Try to force people to keep data in a way that everybody can find it, and not just their brain can find it.”

As cloud migration and overall IT modernization move forward, agencies don’t want excess baggage in their path. “Dark data can actually prevent your capability to continue some of the modernizations, because it’s in the way,” Cummings said, offering some additional advice for those looking to rein in their data management practices.

“You might not have time to buy a new tool. Look at what you already have, see if that helps you get there,” he said. “If there’s a gap, find whatever you need to do to fill that gap in order to identify your unstructured, your dark data, your dusty data. Then start putting those governance pieces around it so that you actually can start controlling how it’s created and what it’s used for in the future.”

 

Recent