Scalability for Clustering Algorithms Revisited

14 years 15 days ago

Download reference.kfupm.edu.sa

This paper presents a simple new algorithm that performs k-means clustering in one scan of a dataset, while using a bu er for points from the dataset of xed size. Experiments show that the new method is several times faster than standard k-means, and that it produces clusterings of equal or almost equal quality. The new method is a simpli cation of an algorithm due to Bradley, Fayyad and Reina that uses several data compression techniques in an attempt to improve speed and clustering quality. Unfortunately, the overhead of these techniques makes the original algorithm several times slower than standard k-means on materialized datasets, even though standard k-means scans a dataset multiple times. Also, lesion studies show that the compression techniques do not improve clustering quality. All results hold for 400 megabyte synthetic datasets and for a dataset created from the real-world data used in the 1998 KDD data mining contest. All algorithm implementations and experiments are desig...

Fredrik Farnstrom, James Lewis, Charles Elkan

Real-time Traffic

Dataset | K-means | SIGKDD 2000 | Standard K-means |

claim paper

Post Info
More Details (n/a)

Added	19 Dec 2010
Updated	19 Dec 2010
Type	Journal
Year	2000
Where	SIGKDD
Authors	Fredrik Farnstrom, James Lewis, Charles Elkan

Comments (0)

Sciweavers

Scalability for Clustering Algorithms Revisited

Dataset | K-means | SIGKDD 2000 | Standard K-means |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers