The Boundary Between Privacy and Utility in Data Publishing

16 years 22 days ago

Download www.vldb.org

We consider the privacy problem in data publishing: given a database instance containing sensitive information “anonymize” it to obtain a view such that, on one hand attackers cannot learn any sensitive information from the view, and on the other hand legitimate users can use it to compute useful statistics. These are conﬂicting goals. In this paper we prove an almost crisp separation of the case when a useful anonymization algorithm is possible from when it is not, based on the attacker’s prior knowledge. Our deﬁnition of privacy is derived from existing literature and relates the attacker’s prior belief for a given tuple t, with the posterior belief for the same tuple. Our deﬁnition of utility is based on the error bound on the estimates of counting queries. The main result has two parts. First we show that if the prior beliefs for some tuples are large then there exists no useful anonymization algorithm. Second, we show that when the prior is bounded for all tuples th...

Vibhor Rastogi, Sungho Hong, Dan Suciu

Real-time Traffic