An organization makes a new release as new information become available, releases a tailored view for each data request, releases sensitive information and identifying information separately. The availability of related releases sharpens the identification of individuals by a global quasi-identifier consisting of attributes from related releases. Since it is not an option to anonymize previously released data, the current release must be anonymized to ensure that a global quasi-identifier is not effective for identification. In this paper, we study the sequential anonymization problem under this assumption. A key question is how to anonymize the current release so that it cannot be linked to previous releases yet remains useful for its own release purpose. We introduce the lossy join, a negative property in relational database design, as a way to hide the join relationship among releases, and propose a scalable and practical solution. Categories and Subject Descriptors H.2.7 [Database...
Ke Wang, Benjamin C. M. Fung