A recent study by a group of Danish researchers is raising thorny questions about whether publicly available user data should nonethless be treated with privacy and restraint in mind. Here are the key details as reported by Wired:

ON MAY 8, a group of Danish researchers publicly released a dataset of nearly 70,000 users of the online dating site OkCupid, including usernames, age, gender, location, what kind of relationship (or sex) they’re interested in, personality traits, and answers to thousands of profiling questions used by the site.

When asked whether the researchers attempted to anonymize the dataset, Aarhus University graduate student Emil O. W. Kirkegaard, who was lead on the work, repliedbluntly: “No. Data is already public.”

For those concerned about privacy, research ethics, and the growing practice of publicly releasing large data sets, this logic of “but the data is already public” is an all-too-familiar refrain used to gloss over thorny ethical concerns. The most important, and often least understood, concern is that even if someone knowingly shares a single piece of information, big data analysis can publicize and amplify it in a way the person never intended or agreed.

The full Wired report is well worth a read, and the ground it covers resonates with Constellation Research VP and principal analyst Steve Wilson, who focuses on privacy and security matters.

"Privacy is not about secrecy," Wilson says. "Privacy is the protection you need when your affairs are not secret."

Kirkegaard "has committed the most basic category error in information science and privacy," Wilson adds. "'Public' is not the relevant quality of personal data. Rather, it is whether that data is identifiable or not. If identifiable, then privacy applies regardless of whether the data was in "public.' Even if data is in public, people must exercise restraint. Privacy principles require that we not collect personal data that we don't really need; that we do not re-purpose personal information in unexpected ways; and that we tell people what we are doing with their personal information and why. The OkCupid researchers seem oblivious to these principles."

Then there's the fact that the definition of "public" data is rather ambigious, Wilson adds. "Big data has provided extra special powers to uncover insights about individuals from public data. These powers bring great responsibilities."

The privacy aspects of public data might seem counterintuitive to technologists and data scientists, Wilson notes. "But intuitions are often flawed, and they need to reflect on the fact that legal restraints apply to all sorts of intangible, frequently 'public' assets. Consider intellectual property, minerals, or electromagnetic spectrum. These are all regulated despite being in plain view."

Reprints
Reprints can be purchased through Constellation Research, Inc. To request official reprints in PDF format, please contact Sales.