In this paper we observe that k-anonymizing a data set is strikingly similar to building a spatial index over the data set, so similar in fact that classical spatial indexing techniques can be used to anonymize data sets. We use this observation to leverage over 20 years of work on database indexing to provide efficient and dynamic anonymization techniques. Experiments with our implementation show that the R-tree index-based approach yields a batch anonymization algorithm that is orders of magnitude more efficient than previously proposed algorithms and has the advantage of supporting incremental updates. Finally, we show that the anonymizations generated by the R-tree approach do not sacrifice quality in their search for efficiency; in fact, by several previously proposed quality metrics, the compact partitioning properties of R-trees generate anonymizations superior to those generated by previously proposed anonymization algorithms.
Tochukwu Iwuchukwu, Jeffrey F. Naughton