A site devoted mostly to everything related to Information Technology under the sun - among other things.

Tuesday, June 11, 2013

Unique in the Crowd

In the paper "Unique in the Crowd: The Privacy Bounds of Human Mobility," published in Nature's Scientific Reports last year, Yves-Alexandre de Montjoye, César A. Hidalgo, Michel Verleysen, and Vincent D. Blondel of MIT and the Universite Catholique de Louvain, examined a dataset of 15 months of anonymous cell-phone data from 1.5 million people in Belgium (most likely).

There were no names, addresses, or phone numbers in the data, yet they argue that "if individual's patterns are unique enough, outside information can be used to link the data back to an individual." In fact, just four points of observation -- time of the call and the nearest cell-phone tower -- were enough to identify 95 percent of individuals in the data et. 

That is, if one makes four calls from four different places over the course of a 15-month period, one's pattern of movement could be identified out of a population two and a half times the size of Washington, D.C.

If someone is able to cross-reference that with the Twitter feed, say,  he could be able to build a pretty good picture of  who be that person .

The pattern still worked when the researchers "coarsened" their sample by using less specific time observations and by lumping multiple cell-phone towers into one.

Evidently, the way people move through the world and share and communicate information is quite distinctive.

de Montjoye  has stated that: "We use the analogy of the fingerprint, in the 1930s, Edmond Locard, one of the first forensic science pioneers, showed that each fingerprint is unique, and you need 12 points to identify it. So here what we did is we took a large-scale database of mobility traces and basically computed the number of points so that 95 percent of people would be unique in the dataset."

Since phone companies need to keep this kind of data for billing and customer service purposes, it seems inevitable that it would sooner or later be put to questionable use by the security agencies of various governments.
 
The authors have an op-ed in the Christian Science Monitor arguing that consumers should be granted more control over and more information about how much of their data is being stored and for what purpose. 

Their study "shows that when it comes to rich metadata datasets, there are no clear cut between anonymous and not anonymous data. Achieving anonymity is really hard and might even be algorithmically impossible."

The paper may be found here:

http://www.nature.com/srep/2013/130325/srep01376/pdf/srep01376.pdf

No comments:

About Me

My photo
I had been a senior software developer working for HP and GM. I am interested in intelligent and scientific computing. I am passionate about computers as enablers for human imagination. The contents of this site are not in any way, shape, or form endorsed, approved, or otherwise authorized by HP, its subsidiaries, or its officers and shareholders.

Blog Archive