MIT Finds Privacy Risks in Anonymized Mobile Data

Phone mobile security protection

Even anonymity doesn’t guarantee privacy. Not even in a crowd of millions.

That’s the finding of a new study by Massachusetts Institute of Technology (MIT) researchers who found that anonymized mobility data can still result in privacy risks when that data is combined with data from other sources. Data–lots of it–is widely seen as the key to better planning for cities, transportation lines, and any kind of mobility services. But collecting all that data has an unintended privacy risk, even when taking pains to protect people’s identities.

In the broader cybersecurity picture, the findings reveal another potential weakness that could emerge from the proliferation of mobile applications.

The research team performed the first-ever study of what’s called user “matchability,” using two massive, anonymized datasets collected in Singapore whose data was low-density, meaning there were few records generated per day. One set was from a mobile network operator, comprised of timestamps and geographic coordinates in more than 485 million records from over two million users. The other dataset came from a transportation system and included more than 70 million records with timestamps for individuals moving through the city.

The “tell” in the data was the location stamps, which combine geographic coordinates and timestamps. They exist in both datasets and are, the researchers said, very specific to individuals. Merging the two datasets allowed the team to determined the probability that certain data points in each set came from the same person. With a week’s worth of data, they estimated they could identify an individual’s personal pattern about 17 percent of the time. After a month, the number rises to 55 percent. And after 11 weeks, the probability goes up to 95 percent.

And because, these days, nearly everything collects data, it’s not much of a jump to see how data from phone calls, smart trip cards, credit card transactions, Twitter accounts, and mobile applications could be combined to reveal a person’s identity, since it’s unlikely that no two people follow exactly the same pattern throughout the day. “In short, if someone has my anonymized credit card information, and perhaps my open location data from Twitter, they could then deanonymize my credit card data,” said Carlo Ratti, a professor of the practice in MIT’s Department of Urban Studies and Planning, director of MIT’s Senseable City Lab, and a co-author of the team’s paper, which was published in IEEE Transactions on Big Data.

“As researchers, we believe that working with large-scale datasets can allow discovering unprecedented insights about human society and mobility, allowing us to plan cities better,” said Daniel Kondor, a post-doctoral researcher in the Future Urban Mobility Group at the Singapore-MIT Alliance for Research and Technology. “Nevertheless, it is important to show if identification is possible, so people can be aware of potential risks of sharing mobility data.”

The growing use of mobile applications, with their accompanying geolocation features, can pose a threat, particularly when combined with big data analytics and artificial intelligence. The Pentagon had a rude awakening early this year when it was revealed that fitness tracking data posted online–in that case, by the fitness app company Strava–could reveal the activities of military personnel, including those working in sensitive areas. The Department of Defense put restrictions on the use of GPS tracking apps in wake of that incident, but similar threats could crop up, given the prevalence of mobile computing. The Army Corps of Engineers, for instance, uses its home-grown Mobile Information Collection Application to allow its personnel to transmit real-time information in response to disasters, as they did recently during floods in Florida and Texas and the wildfires in California.

Meanwhile, it seems worth paying attention to research that reveals potential weaknesses. “In publishing the results–and, in particular, the consequences of deanonymizing data–we felt a bit like white hat or ethical hackers,” Ratti said. “We felt that it was important to warn people about these new possibilities [of data merging] and [to consider] how we might regulate it.”