A paper—“Robust De-anonymization of Large Sparse Datasets”—by Vitaly Shmatikov and Arvind Narayanan, originally published in 2008, has received an inaugural “Test of Time” award from the IEEE Symposium on Security and Privacy.
In response to the award, Shmatikov and Narayanan, wrote a brief reply—“Robust De-anonymization of Large Sparse Datasets: a Decade Later”—in which they adduce “some lessons from the last decade of de-anonymization research.”
As they put it, “[p]erhaps the main lesson of our paper is that data collection has grown so comprehensive that de-anonymization need no longer rely on demographic attributes. Techniques for protecting against de-anonymization such as making a few attributes more coarse-grained break down for datasets of watched movies or browsing histories or visited locations when these datasets contain hundreds or thousands of observations per individual.”
“We and other researchers,” note Shmatikov and Narayanan, “have since demonstrated robust de-anonymization techniques in many other domains: social networks, genetic data, location data, credit card data, browsing histories, writing style, source code, and compiled binaries.” As a result, “[t]his line of research has firmly established that high-dimensional data is inherently vulnerable to de-anonymization.” Moreover, “[t]his is also supported by theoretical evidence.”
Shmatikov and Narayanan remain concerned that "[t]oday’s privacy regulations, including the GDPR, continue to put substantial weight on de-identification. Our key recommendation is that the burden of proof be on the data controller to affirmatively show that anonymized data cannot be linked to individuals, rather than on privacy advocates to show that linkage is possible.”
“If we want sophisticated privacy technologies to be adopted,” Shmatikov and Narayanan conclude, “we need to work on the sociotechnical infrastructures that minimize the gap between privacy guarantees and perception of privacy. Those infrastructures are sorely lacking today."
Read the rest of their response at this link.
At the time of publication, in 2008, Narayanan was a doctoral candidate at the University of Texas at Austin. At present, after postdoctoral work at Stanford, he is an associate professor of Computer Science at Princeton. Vitaly Shmatikov is currently a Professor at Cornell Tech and in the Computer Science Department at Cornell University. Prior to joining Cornell Tech, he worked at the University of Texas at Austin and SRI International. He obtained his Ph.D. in Computer Science and M.S. in engineering-economic systems from Stanford.