The need for evaluating large amounts of topics (queries) makes IR evaluation an uneasy task. In this paper, we study a topic selection problem for IR evaluation. The selection criterion is based on the overall difficulty of the chosen set, as well as the uncertainty of the final IR metric applied to the systems. Our preliminary experiments demonstrate that our approach helps to identify a set of topics that provides confident estimates of systems’ performance while keeping the requirement of the query difficulty. Categories and Subject Descriptors H.3 [Information Storage and Retrieval]: H.3.3Information Search and Retrieval General Terms Experimentation, Measurement, Performance
Jianhan Zhu, Jun Wang, Vishwa Vinay, Ingemar J. Co