K-means clustering versus validation measures: a data distribution perspective

16 years 7 months ago

Download datamining.rutgers.edu

K-means is a widely used partitional clustering method. While there are considerable research efforts to characterize the key features of K-means clustering, further investigation is needed to reveal whether and how the data distributions can have the impact on the performance of K-means clustering. Indeed, in this paper, we revisit the K-means clustering problem by answering three questions. First, how the "true" cluster sizes can make impact on the performance of K-means clustering? Second, is the entropy an algorithmindependent validation measure for K-means clustering? Finally, what is the distribution of the clustering results by Kmeans? To that end, we first illustrate that K-means tends to generate the clusters with the relatively uniform distribution on the cluster sizes. In addition, we show that the entropy measure, an external clustering validation measure, has the favorite on the clustering algorithms which tend to reduce high variation on the cluster sizes. Fina...

Hui Xiong, Junjie Wu, Jian Chen

Real-time Traffic

Clustering Validation Measure | Data Mining | K-means Clustering Problem | KDD 2006 | Partitional Clustering Method |

claim paper

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2006
Where	KDD
Authors	Hui Xiong, Junjie Wu, Jian Chen

Comments (0)

Sciweavers

K-means clustering versus validation measures: a data distribution perspective

Clustering Validation Measure | Data Mining | K-means Clustering Problem | KDD 2006 | Partitional Clustering Method |

Explore & Download

Productivity Tools

Sciweavers