Adjusted Rand index is used to measure diversity in cluster ensembles and a diversity measure is subsequently proposed. Although the measure was found to be related to the quality of the ensemble, this relationship appeared to be non-monotonic. In some cases, ensembles which exhibited a moderate level of diversity gave a more accurate clustering. Based on this, a procedure for building a cluster ensemble of a chosen type is proposed (assuming that an ensemble relies on one or more random parameters): generate a small random population of cluster ensembles, calculate the diversity of each ensemble and select the ensemble corresponding to the median diversity. We demonstrate the advantages of both our measure and procedure on 5 data sets and carry out statistical comparisons involving two diversity measures for cluster ensembles from the recent literature. An experiment with 9 data sets was also carried out to examine how the diversity-based selection procedure fares on ensembles of var...
Stefan Todorov Hadjitodorov, Ludmila I. Kuncheva,