Information theoretic measures for clusterings comparison: is a correction for chance necessary?

16 years 7 months ago

Download www.ima.umn.edu

Information theoretic based measures form a fundamental class of similarity measures for comparing clusterings, beside the class of pair-counting based and set-matching based measures. In this paper, we discuss the necessity of correction for chance for information theoretic based measures for clusterings comparison. We observe that the baseline for such measures, i.e. average value between random partitions of a data set, does not take on a constant value, and tends to have larger variation when the ratio between the number of data points and the number of clusters is small. This effect is similar in some other non-information theoretic based measures such as the well-known Rand Index. Assuming a hypergeometric model of randomness, we derive the analytical formula for the expected mutual information value between a pair of clusterings, and then propose the adjusted version for several popular information theoretic based measures. Some examples are given to demonstrate the need and us...

Xuan Vinh Nguyen, Julien Epps, James Bailey

Real-time Traffic

ICML 2009 | Machine Learning | Set-matching Based Measures | Similarity Measures | Theoretic Based Measures |

claim paper

Post Info
More Details (n/a)

Added	17 Nov 2009
Updated	17 Nov 2009
Type	Conference
Year	2009
Where	ICML
Authors	Xuan Vinh Nguyen, Julien Epps, James Bailey

Comments (0)

Sciweavers

Information theoretic measures for clusterings comparison: is a correction for chance necessary?

ICML 2009 | Machine Learning | Set-matching Based Measures | Similarity Measures | Theoretic Based Measures |

Explore & Download

Productivity Tools

Sciweavers