Is anyone aware of good data anonymization software? Or perhaps a package for R that does data anonymization?

Thanks in advance,


asked Jul 26 '10 at 23:40

Anton%20Ballus's gravatar image

Anton Ballus

2 Answers:

It depends on whether you want to anonymize structured or unstructured data.

For structured data, "k-anonymity: a model for protecting privacy" (Sweeney, 2002) introduced the k-anonymity model which guarantees against re-identification without damaging the data more than necessary. (Also consider some of the derived work.)

For unstructured data, I would recommend taking a look at the deid software package, which is described in "Automated de-identification of free-text medical records" (Neamatullah et al, 2008). Although this package is geared towards removal of protected health information (PHI), as defined by the HIPAA Privacy Rule, the underlying principles should be applicable for other domains.

answered Jul 27 '10 at 07:51

Thomas%20Brox%20R%C3%B8st's gravatar image

Thomas Brox Røst

have a look at the Anonymization tool kit from Cornell

This answer is marked "community wiki".

answered Jul 27 '10 at 00:29

DirectedGraph's gravatar image


Also keep in mind that certain things such as timestamps can also lead to vulnerabilities in Anonymization schemes

(Jul 27 '10 at 01:01) DirectedGraph

Can't timestamps be anonymised?

(Jul 27 '10 at 18:43) Anton Ballus did it by scaling and normalizing time stamps for flickr dataset.

(Jul 28 '10 at 02:36) DirectedGraph
Your answer
toggle preview


Once you sign in you will be able to subscribe for any updates here



Asked: Jul 26 '10 at 23:40

Seen: 2,429 times

Last updated: Jul 28 '10 at 02:36

Related questions

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.