Is anyone aware of good data anonymization software? Or perhaps a package for R that does data anonymization?
Thanks in advance,
asked Jul 26 '10 at 23:40
It depends on whether you want to anonymize structured or unstructured data.
For structured data, "k-anonymity: a model for protecting privacy" (Sweeney, 2002) introduced the k-anonymity model which guarantees against re-identification without damaging the data more than necessary. (Also consider some of the derived work.)
For unstructured data, I would recommend taking a look at the deid software package, which is described in "Automated de-identification of free-text medical records" (Neamatullah et al, 2008). Although this package is geared towards removal of protected health information (PHI), as defined by the HIPAA Privacy Rule, the underlying principles should be applicable for other domains.
answered Jul 27 '10 at 07:51
Thomas Brox Røst
have a look at the Anonymization tool kit from Cornell
This answer is marked "community wiki".
answered Jul 27 '10 at 00:29