Background: Biological information is commonly used to cluster or classify entities of interest such as genes, conditions, species or samples. However, different sources of data can be used to classify the same set of entities and methods allowing the comparison of the performance of two data sources or the determination of how well a given classification agrees with another are frequently needed, especially in the absence of a universally accepted "gold standard" classification. Results: Here, we describe a novel measure – the Ranked Adjusted Rand (RAR) index. RAR differs from existing methods by evaluating the extent of agreement between any two groupings, taking into account the intercluster distances. This characteristic is relevant to evaluate cases of pairs of entities grouped in the same cluster by one method and separated by another. The latter method may assign them to close neighbour clusters or, on the contrary, to clusters that are far apart from each other. RA...
Francisco R. Pinto, João A. Carriço,