Sciweavers

SIGMOD
1998
ACM

CURE: An Efficient Clustering Algorithm for Large Databases

14 years 3 months ago
CURE: An Efficient Clustering Algorithm for Large Databases
Clustering, in data mining, is useful for discovering groups and identifying interesting distributions in the underlying data. Traditional clustering algorithms either favor clusters with spherical shapes and similar sizes, or are very fragile in the presence of outliers. We propose a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size. CURE achieves this by representing each cluster by a certain fixed number of points that are generated by selecting well scattered points from the cluster and then shrinking them toward the center of the cluster by a specified fraction. Having more than one representative point per cluster allows CURE to adjust well to the geometry of non-spherical shapes and the shrinking helps to dampen the effects of outliers. To handle large databases, CURE employs a combination of random sampling and partitioning. A random sample drawn from the data set is first partit...
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Added 05 Aug 2010
Updated 05 Aug 2010
Type Conference
Year 1998
Where SIGMOD
Authors Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Comments (0)