Fast and exact out-of-core and distributed k-means clustering

13 years 11 months ago

Download www.cse.ohio-state.edu

Clustering has been one of the most widely studied topics in data mining and k-means clustering has been one of the popular clustering algorithms. K-means requires several passes on the entire dataset, which can make it very expensive for large disk-resident datasets. In view of this, a lot of work has been done on various approximate versions of k-means, which require only one or a small number of passes on the entire dataset. In this paper, we present a new algorithm, called Fast and Exact K-means Clustering (FEKM), which typically requires only one or a small number of passes on the entire dataset, and provably produces the same cluster centers as reported by the original k-means algorithm. The algorithm uses sampling to create initial cluster centers, and then takes one or more passes over the entire dataset to adjust these cluster centers. We provide theoretical analysis to show that the cluster centers thus reported are the same as the ones computed by the original k-means algor...

Ruoming Jin, Anjan Goswami, Gagan Agrawal

Real-time Traffic

Algorithm | Entire Dataset | K-means | KAIS 2006 |

claim paper

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2006
Where	KAIS
Authors	Ruoming Jin, Anjan Goswami, Gagan Agrawal

Comments (0)

Sciweavers

Fast and exact out-of-core and distributed k-means clustering

Algorithm | Entire Dataset | K-means | KAIS 2006 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers