Abstract— Some of the established approaches to evaluating text clustering algorithms for information retrieval show theoretical flaws. In this paper, we analyze these flaws and introduce a new evaluation measure to overcome them. Based on a simple yet rigorous mathematical analysis of the effect of certain parameters in cluster based retrieval, we show that certain conclusions drawn in the recent literature must be taken with a grain of salt. Our new measure, in contrast, accounts for statistical biases that have to be expected according to our analysis. A series of experiments and a comparison with results reported recently underlines that this measure is a more suitable performance indicator that allows for more meaningful interpretations.