Previous research on cluster-based retrieval has been inconclusive as to whether it does bring improved retrieval effectiveness over document-based retrieval. Recent developments in the language modeling approach to IR have motivated us to re-examine this problem within this new retrieval framework. We propose two new models for cluster-based retrieval and evaluate them on several TREC collections. We show that cluster-based retrieval can perform consistently across collections of realistic size, and significant improvements over document-based retrieval can be obtained in a fully automatic manner and without relevance information provided by human. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval – Retrieval models. General Terms: Theory, Experimentation
Xiaoyong Liu, W. Bruce Croft