|
Is anyone aware of good data anonymization software? Or perhaps a package for R that does data anonymization? Thanks in advance, Anton |
|
It depends on whether you want to anonymize structured or unstructured data. For structured data, "k-anonymity: a model for protecting privacy" (Sweeney, 2002) introduced the k-anonymity model which guarantees against re-identification without damaging the data more than necessary. (Also consider some of the derived work.) For unstructured data, I would recommend taking a look at the deid software package, which is described in "Automated de-identification of free-text medical records" (Neamatullah et al, 2008). Although this package is geared towards removal of protected health information (PHI), as defined by the HIPAA Privacy Rule, the underlying principles should be applicable for other domains. |
|
have a look at the Anonymization tool kit from Cornell
This answer is marked "community wiki".
Also keep in mind that certain things such as timestamps can also lead to vulnerabilities in Anonymization schemes
(Jul 27 '10 at 01:01)
DirectedGraph
Can't timestamps be anonymised?
(Jul 27 '10 at 18:43)
Anton Ballus
http://socialnetworks.mpi-sws.org/ did it by scaling and normalizing time stamps for flickr dataset.
(Jul 28 '10 at 02:36)
DirectedGraph
|