Year 2025 – An Age of Machine Learning and Data On-Demand

The size of the digital universe will grow to 176 zettabytes by 2025–leading to a future of machine learning that could have significant ramifications for the defense and intelligence communities, officials said.

Speaking at an event hosted by DefenseOne, James Harris, chief technology officer of the Defense Intelligence Agency (DIA), and Jason Matheny, director of the Intelligence Advanced Research Projects Activity (IARPA), said speed and automation will be key to the future of intelligence collection and analysis.

DIA is testing machine learning to understand the full range of capabilities. One example is a machine that runs through all of the resumes a company has received, and matches the resumes with job vacancy announcements, said Harris. While this is a small-scale example, it opens up the possibility of machines running all open-source data. Once machines conquer smaller scaled data sets, IT can focus on rapidly moving data sets that fall under the big data category.

“It’s really about speed,” Harris said. “The human beings that do that right now…where human beings have to sit down and go through thousands of resumes, now we can do this in what I’m predicting is going to be a matter of minutes,” he said. “Now we want to take that and open it up to Twitter feeds and other data sets that are moving rapidly, that adhere to the definition of big data…and be able to apply machine learning.”

Patrick Tucker, Defense One, James Harris, DIA, and Dr. Jason Matheny, IARPA (Photo: MeriTalk) — Patrick Tucker, Defense One, James Harris, DIA, and Dr. Jason Matheny, IARPA

Automation and machine learning will continue to help analysts in the future. Currently, analysts spend a lot of their time hunting for relevant information within these large data sets. If this process is automated, data would be available on-demand–freeing the analyst’s time to gain insight from the data, instead of searching for it.

But the growing reliance and use of open-source systems leads to an overwhelming amount of data. Harris said DIA is focused on making the Intelligence Community Information Technology Enterprise (ICITE) initiative operational. Integrating cloud, social media, mobile, big data analytics, and more enhances the performance of the analyst.

Matheny described ICITE as a “tremendous benefit” that allows analysts to discover data and patterns within that data – when they need it. The aftermath of the Edward Snowden disclosures at the NSA brought about an insider threat task force to create a plan for dealing with threats and the potential loss of information. IARPA’s SCITE program–Scientific advances to Continuous Insider Threat Evaluation–leverages data to detect behaviors that are indicative of someone misusing a system or information. Continuous evaluation of insider threats will be a part of everyone’s information environment 30 years from now, Matheny said.

However, the information environment of the future faces challenges. Matheny cited the physical limits of computation as a main barrier, saying future capabilities will require many orders of magnitude beyond what is currently possible. In fact, because of the massive power requirements of semiconductors to do things like exascale computing, IARPA is actively studying and investing in a new class of computer processors that do not rely on semiconductors.

“Using superconductors, we can get about 100 times more energy efficient than we can with semiconductors, and we can also significantly reduce the physical footprint of computers,” Matheny said. “If today you were to build an exascale computer out of semiconductors, it would require something approaching half of a football field size rack. With superconducting computing you could bring that down to something smaller than the size of this stage.”

Harris pointed to bureaucratic challenges. Specially, learning how to overcome traditional acquisition processes and speed up acquisition of leading-edge technology. Agencies are already behind the curve by the time IT can move forward with an updated system. We need to enable the government to move quicker and keep up with the rapidly growing technology field.

Where does this leave government IT? We need to close the gap between raw data and useful outcomes. Cloud technology has aided the on-demand computing process by pulling together more data in a scaled manner. But government needs to speed up their acquisition process if we want to increase our use of machine learning and automating processes.