Statistical precision of information retrieval evaluation

14 years 8 months ago

Download plg.uwaterloo.ca

We introduce and validate bootstrap techniques to compute conﬁdence intervals that quantify the eﬀect of test-collection variability on average precision (AP) and mean average precision (MAP) IR eﬀectiveness measures. We consider the test collection in IR evaluation to be a representative of a population of materially similar collections, whose documents are drawn from an inﬁnite pool with similar characteristics. Our model accurately predicts the degree of concordance between system results on randomly selected halves of the TREC-6 ad hoc corpus. We advance a framework for statistical evaluation that uses the same general framework to model other sources of chance variation as a source of input for meta-analysis techniques. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Systems and Software – performance evaluation General Terms Experimentation, Measurement Keywords bootstrap, conﬁdence interval, precision

Gordon V. Cormack, Thomas R. Lynam

Real-time Traffic

Average Precision | Conﬁdence Interval | Mean Average Precision | SIGIR 2006 |

claim paper

Post Info
More Details (n/a)

Added	14 Jun 2010
Updated	14 Jun 2010
Type	Conference
Year	2006
Where	SIGIR
Authors	Gordon V. Cormack, Thomas R. Lynam

Comments (0)

Sciweavers

Statistical precision of information retrieval evaluation

Average Precision | Conﬁdence Interval | Mean Average Precision | SIGIR 2006 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers