Score standardization for inter-collection comparison of retrieval systems

15 years 2 months ago

Download ww2.cs.mu.oz.au

The goal of system evaluation in information retrieval has always been to determine which of a set of systems is superior on a given collection. The tool used to determine system ordering is an evaluation metric such as average precision, which computes relative, collection-specific scores. We argue that a broader goal is achievable. In this paper we demonstrate that, by use of standardization, scores can be substantially independent of a particular collection, allowing systems to be compared even when they have been tested on different collections. Compared to current methods, our techniques provide richer information about system performance, improved clarity in outcome reporting, and greater simplicity in reviewing results from disparate sources. Categories and Subject Descriptors H.3.4 [Information Storage and Retrieval]: Systems and software--performance evaluation. Keywords Retrieval experiment, evaluation, average precision, system measurement General Terms Measurement, perform...

William Webber, Alistair Moffat, Justin Zobel

Real-time Traffic