Random data perturbation (RDP) has been in use for several years in statistical databases and public surveys as a means of providing privacy to individuals while collecting information on groups. It has recently gained popularity as a privacy technique in data mining. To our knowledge, attacks on binary RDP have not been completely characterized, its security has not been analyzed from a complexity-theoretic or information-theoretic perspective, and there is no privacy measure of binary RDP that is related to the complexity of an attack. We characterize all inference attacks on binary RDP, and show that if it is possible to reduce estimation error indefinitely, a finite number of queries per bit of entropy is enough to do so. We define this finite number as the privacy measure of the binary RDP.
Poorvi L. Vora