Maximizing only the relevance between queries and documents will not satisfy users if they want the top search results to present a wide coverage of topics by a few representative documents. In this paper, we propose two new metrics to evaluate the performance of information retrieval: diversity, which measures the topic coverage of a group of documents, and information richness, which measures the amount of information contained in a document. Then we present a novel ranking scheme, Affinity Rank, which utilizes these two metrics to improve search results. We demonstrate how Affinity Rank works by a toy data set, and verify our method by experiments on real-world data sets. Categories & Subject Descriptors: H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval ? retrieval models, search process; H.2.8 [Database Management]: Database Applications ? Data Mining General Terms: Algorithms, Performance
Yi Liu, Benyu Zhang, Zheng Chen, Michael R. Lyu, W