Protecting data privacy is an important problem in microdata distribution. Anonymization algorithms typically aim to protect individual privacy, with minimal impact on the quality of the resulting data. While the bulk of previous work has measured quality through one-size-fits-all measures, we argue that quality is best judged with respect to the workload for which the data will ultimately be used. This paper provides a suite of anonymization algorithms that produce an anonymous view based on a target class of workloads, consisting of one or more data mining tasks, as well as selection predicates. An extensive experimental evaluation indicates that this approach is often more effective than previous anonymization techniques. Categories and Subject Descriptors H.2.8 [Database Management]: Database Applications General Terms Algorithms, Experimentation, Security Keywords Privacy, Anonymity, Data Recoding, Predictive Modeling
Kristen LeFevre, David J. DeWitt, Raghu Ramakrishn