By 2020, there will be three times as many connected devices in the world as people.
Information from all of those devices will feed into data lakes, which need to be broken down and sorted, according to Brian Houston, chief technology officer of Hitachi’s Federal data systems. Houston, who spoke at MeriTalk’s Big Data Brainstorm Nov. 16, said managing incoming data is just as important as understanding existing data. He said that organizations spend up to 85 percent of their time breaking down data silos, and that data analytics will come to a stalemate if data scientists are not allowed to mine information.
“There are not enough human eyeballs to sift through all the data out there,” said Alan Ford, director of government systems pre-sales at Teradata. “We’re not going to be able to keep up with analysis on Internet of Things (IoT) data with our current methods.”
According to David Shuman, industry leader for retail, manufacturing, and IoT at Cloudera, metadata is the solution to avoid getting bogged down in too much data. Metadata is data that describes other data. Shuman said metadata is “a friend,” and can be used to create a tiered storage system to prevent a data lake from turning into a data swamp.
Data can be sorted through tags, similar to the ones used on Twitter to denote an event or trend. Because some information, such as health records and legal files, demand more protection, Shuman said that data analysts must be able to tag information quickly as they sift.
Houston said between 60 and 70 percent of data has not been accessed or modified in six months or a year. Studying metadata can prevent vast amounts of information from getting lost or growing stale. Metadata allows data managers to see who has modified information and deleted files, according to Houston. He said that making data usable, while keeping it normalized, allows constituents to use the information to find solutions.
“Metadata truly is the keys to the kingdom,” Houston said.