The Paris attacks did more than reveal the growing capabilities of the Islamic State to carry out attacks in the West; they revealed what some fear are chinks in America’s big data armor.
In a world where everything seems to be connected digitally and everyone seemingly leaves behind traces of digital dust that can be mined, analyzed and linked, there remain a large number of terrorist supporters and sympathizers who inhabit what is known as a clean skin—their identities and biographies are a mystery to the intelligence databases maintained by the U.S. and its allies.
The common traces that most people in modern societies leave behind—driver’s licenses, birth certificates and school records—do not exist in many parts of the developing world, said Center for Immigration Studies Executive Director Mark Krikorian, during a House Judiciary subcommittee hearing on the Syrian refugee crisis. As a result, U.S. authorities will have very little information upon which to base their decision to allow somebody into the country. “The French sent our intelligence agencies the fingerprints of the attackers in Paris and there was no trace of them anywhere in our databases—the very databases that we are supposed to be using to screen the Syrian refugees,” Krikorian said.
But the Intelligence Advanced Research Projects Agency (IARPA)—the high-tech research arm of the U.S. intelligence community—has started several programs focused on the use of big data to solve this and other challenges facing the intelligence and homeland security communities. The big question, however, is if the technologies can be developed and matured fast enough to find their way to the front lines of the war on terror.
The Janus program, for example, is focused on doing facial recognition based on imagery taken from whatever sources exist. “Those could be still frames from closed-circuit TVs, they can be photographs off of cellphones, and can match the images across different angles, different poses and different lighting conditions,” said IARPA Director Dr. Jason Matheny, in an exclusive interview with MeriTalk. The goal is to take those images and develop the ability to match them to a database of known terrorists.
“That’s a very hard problem,” Matheny said. “It requires really sophisticated machine learning approaches to make sense of all of those facial images from different angles and be able to build a model of the entire face from the range of different angles.”
Another program underway at IARPA, code-named Aladdin, is focused on searching through online video. “Before an event, a martyrdom video might be posted, a how-to video on how to produce an explosive might be posted, including on [sites] such as YouTube,” said Matheny. “And the challenge is that the posters of these videos don’t want these videos to be found except by the people within their group. So they’re not tagging the video. For those sorts of videos, we really need a way of searching for them that doesn’t rely on tags or user-generated content, but instead actually looks through the video itself to describe what’s happening in the video.”
Knowing What to Worry About
Time is the currency of the intelligence business. Officials just don’t have time to worry about the wrong things; they have to ensure they are focused on the threats that really matter. And that’s where IARPA’s Fuse program comes into play.
“For that we do a different kind of big data analytics,” Matheny said. “We have a program called Fuse that constantly mines the publications and patent databases and other data for indicators of an emerging technology that we should be worrying about.”
But there are limitations to big data and one of the most significant limitations stems from not having enough data. This is particularly important for the counterterrorism mission.
“You really need data on somebody in order to make a judgment,” said Matheny. “The probability that any one person is a threat is extremely low. If it’s a uniform probability, it’s 1 in 7 billion. So, I think the likelihood [of somebody posing a threat] really does need to be informed by the data that you have that is in some way correlated with risk,” he said. “One of the ways in which IARPA has tried to invest in research to develop risk assement is ensuring that data that we do have about individuals can be used in a timely way to inform risk assessments without sacrificing security and privacy.”
Another challenge is aligning the databases in a way that allows them to talk to each other, according to Matheny. An example he points to is identifying an individual who has the characteristics that lead you to believe they might be a terrorist and matching a record of a suspected terrorist to things like airline tickets. “For that, the databases have to line up with one another in a way that they can talk,” he said.
IARPA currently has a program called Knowledge Discovery and Dissemination focusing on this challenge.
“There will be no shortage of problems in big data analytics related to security and data alignment,” said Matheny. “And once you actually get the data together, performing the right kinds of analysis is [critical]. This will not be a problem that we solve in the next several years.”
Listen to an audio podcast of MeriTalk’s interview with Dr. Jason Matheny on MeriTalk Radio’s SoundCloud Channel.