Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifically to document similarity and test the effectiveness of an information-theoretic measure for pairwise document similarity. We adapt query retrieval to rate the quality of document similarity measures and demonstrate that our proposed information-theoretic measure for document similarity yields statistically significant improvements over other popular measures of similarity. Categories and Subject Descriptors: H.3.3 [Information Search and Retrieval ]: Clustering General Terms: Theory, Experimentation
Javed A. Aslam, Meredith Frost