Privacy preserving data mining has been investigated extensively. The previous works mainly fall into two categories, perturbation and randomization based approaches and secure multi-party computation based approaches. The earlier perturbation and randomization approaches have a step to reconstruct the original data distribution. The new research in this area adopts different data distortion methods or modifies the data mining techniques to make it more suitable to the perturbation scenario. Secure multi-party computation approaches which employ cryptographic tools to build data mining models face high communication and computation costs, especially when the number of parties participating in the computation is large. In this paper, we propose a new perturbation based technique. In our solution, we modify the data mining algorithms so that they can be directly used on the perturbed data. In other words, we directly build a classifier for the original data set from the perturbed trai...
Li Liu, Murat Kantarcioglu, Bhavani M. Thuraisingh