We address the problem of the combination of multiple data partitions, that we call a clustering ensemble. We use a recent clustering approach, known as Spectral Clustering, and the classical K-Means algorithm to produce the partitions that constitute the clustering ensembles. A comparative evaluation of several combination methods is performed by measuring the consistency between the combined data partition and (a) ground truth information, and (b) the clustering ensemble. Two consistency measures are used: (i) an index based on cluster matching between two partitions; and (ii) an information theoretic index exploring the concept of mutual information between data partitions. Results on a variety of synthetic and real data sets show that, while combination results are more robust solutions than individual clusterings, no combination method proves to be a clear winner. Furthermore, without the use of a priori information, the mutual information based measure is not able to systematical...
André Lourenço, Ana L. N. Fred