Recently, spectral clustering (a.k.a. normalized graph cut) techniques have become popular for their potential ability at finding irregularlyshaped clusters in data. The input to these methods is a similarity measure between every pair of data points. If the clusters are well-separated, the eigenvectors of the similarity matrix can be used to identify the clusters, essentially by identifying groups of points that are related by transitive similarity relationships. However, these techniques fail when the clusters are noisy and not wellseparated, or when the scale parameter that is used to map distances between points to similarities is not set correctly. Our approach to solving these problems is to introduce a generative probability model that explicitly models noise and can be trained in a maximum-likelihood fashion to estimate the scale parameter. Exact inference is computationally intractable, but we describe tractable, approximate techniques for inference and learning. Interesting...
Rómer Rosales, Brendan J. Frey