A Generalization of Proximity Functions for K-Means

16 years 29 days ago

Download datamining.rutgers.edu

K-means is a widely used partitional clustering method. A large amount of effort has been made on ﬁnding better proximity (distance) functions for K-means. However, the common characteristics of proximity functions remain unknown. To this end, in this paper, we show that all proximity functions that ﬁt K-means clustering can be generalized as K-means distance, which can be derived by a differentiable convex function. A general proof of sufﬁcient and necessary conditions for K-means distance functions is also provided. In addition, we reveal that K-means has a general uniformization effect; that is, K-means tends to produce clusters with relatively balanced cluster sizes. This uniformization effect of K-means exists regardless of proximity functions. Finally, we have conducted extensive experiments on various real-world data sets, and the results show the evidence of the uniformization effect. Also, we observed that external clustering validation measures, such as Entropy and Var...

Junjie Wu, Hui Xiong, Jian Chen, Wenjun Zhou

Real-time Traffic