Information retrieval system evaluation: effort, sensitivity, and reliability

15 years 1 months ago

Download dis.shef.ac.uk

The effectiveness of information retrieval systems is measured by comparing performance on a common set of queries and documents. Significance tests are often used to evaluate the reliability of such comparisons. Previous work has examined such tests, but produced results with limited application. Other work established an alternative benchmark for significance, but the resulting test was too stringent. In this paper, we revisit the question of how such tests should be used. We find that the t-test is highly reliable (more so than the sign or Wilcoxon test), and is far more reliable than simply showing a large percentage difference in effectiveness measures between IR systems. Our results show that past empirical work on significance tests overestimated the error of such tests. We also re-consider comparisons between the reliability of precision at rank 10 and mean average precision, arguing that past comparisons did not consider the assessor effort required to compute such measures. ...

Mark Sanderson, Justin Zobel

Real-time Traffic

Assessor Effort | Mean Average Precision | SIGIR 2005 | Significance Tests |

claim paper

Post Info
More Details (n/a)

Added	26 Jun 2010
Updated	26 Jun 2010
Type	Conference
Year	2005
Where	SIGIR
Authors	Mark Sanderson, Justin Zobel

Comments (0)

Sciweavers

Information retrieval system evaluation: effort, sensitivity, and reliability

Assessor Effort | Mean Average Precision | SIGIR 2005 | Significance Tests |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers