Evaluation measures act as objective functions to be optimized by information retrieval systems. Such objective functions must accurately reflect user requirements, particularly when tuning IR systems and learning ranking functions. Ambiguity in queries and redundancy in retrieved documents are poorly reflected by current evaluation measures. In this paper, we present a framework for evaluation that systematically rewards novelty and diversity. We develop this framework into a specific evaluation measure, based on cumulative gain. We demonstrate the feasibility of our approach using a test collection based on the TREC question answering track. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval--Retrieval models General Terms Experimentation, Human Factors, Measurement
Charles L. A. Clarke, Maheedhar Kolla, Gordon V. C