Sciweavers

ML
2010
ACM

Stability and model selection in k-means clustering

13 years 11 months ago
Stability and model selection in k-means clustering
Abstract Clustering Stability methods are a family of widely used model selection techniques for data clustering. Their unifying theme is that an appropriate model should result in a clustering which is robust with respect to various kinds of perturbations. Despite their relative success, not much is known theoretically on why or when do they work, or even what kind of assumptions they make in choosing an ’appropriate’ model. Moreover, recent theoretical work has shown that they might ’break down’ for large enough samples. In this paper, we focus on the behavior of clustering stability using k-means clustering. Our main technical result is an exact characterization of the distribution to which suitably scaled measures of instability converge, based on a sample drawn from any distribution in Rn satisfying mild regularity conditions. From this, we can show that clustering stability does not ’break down’ even for arbitrarily large samples, at least for the k-means framework. M...
Ohad Shamir, Naftali Tishby
Added 29 Jan 2011
Updated 29 Jan 2011
Type Journal
Year 2010
Where ML
Authors Ohad Shamir, Naftali Tishby
Comments (0)