Privacy-preserving Distributed Clustering using Generative Models

15 years 11 months ago

Download www.lans.ece.utexas.edu

We present a framework for clustering distributed data in unsupervised and semi-supervised scenarios, taking into account privacy requirements and communication costs. Rather than sharing parts of the original or perturbed data, we instead transmit the parameters of suitable generative models built at each local data site to a central location. We mathematically show that the best representative of all the data is a certain “ mean” model, and empirically show that this model can be approximated quite well by generating artiﬁcial samples from the underlying distributions using Markov Chain Monte Carlo techniques, and then ﬁtting a combined global model with a chosen parametric form to these samples. We also propose a new measure that quantiﬁes privacy based on information theoretic concepts, and show that decreasing privacy leads to a higher quality of the combined model and vice versa. We provide empirical results on different data types to highlight the generality of our fr...

Srujana Merugu, Joydeep Ghosh

Real-time Traffic