Sciweavers

DATAMINE
2006

Scalable Clustering Algorithms with Balancing Constraints

14 years 14 days ago
Scalable Clustering Algorithms with Balancing Constraints
Clustering methods for data-mining problems must be extremely scalable. In addition, several data mining applications demand that the clusters obtained be balanced, i.e., be of approximately the same size or importance. In this paper, we propose a general framework for scalable, balanced clustering. The data clustering process is broken down into three steps: sampling of a small representative subset of the points, clustering of the sampled data, and populating the initial clusters with the remaining data followed by refinements. First, we show that a simple uniform sampling from the original data is sufficient to get a representative subset with high probability. While the proposed framework allows a large class of algorithms to be used for clustering the sampled set, we focus on some popular parametric algorithms for ease of exposition. We then present algorithms to populate and refine the clusters. The algorithm for populating the clusters is based on a generalization of the stable...
Arindam Banerjee, Joydeep Ghosh
Added 11 Dec 2010
Updated 11 Dec 2010
Type Journal
Year 2006
Where DATAMINE
Authors Arindam Banerjee, Joydeep Ghosh
Comments (0)