A comparison of statistical significance tests for information retrieval evaluation

14 years 6 months ago

Download www.mansci.uwaterloo.ca

Information retrieval (IR) researchers commonly use three tests of statistical significance: the Student's paired t-test, the Wilcoxon signed rank test, and the sign test. Other researchers have previously proposed using both the bootstrap and Fisher's randomization (permutation) test as nonparametric significance tests for IR but these tests have seen little use. For each of these five tests, we took the ad-hoc retrieval runs submitted to TRECs 3 and 5-8, and for each pair of runs, we measured the statistical significance of the difference in their mean average precision. We discovered that there is little practical difference between the randomization, bootstrap, and t tests. Both the Wilcoxon and sign test have a poor ability to detect significance and have the potential to lead to false detections of significance. The Wilcoxon and sign tests are simplified variants of the randomization test and their use should be discontinued for measuring the significance of a differen...

Mark D. Smucker, James Allan, Ben Carterette

Real-time Traffic

CIKM 2007 | Information Management | Sign Tests | Statistical Significance | Wilcoxon Signed Rank Test |

claim paper

Post Info
More Details (n/a)

Added	13 Aug 2010
Updated	13 Aug 2010
Type	Conference
Year	2007
Where	CIKM
Authors	Mark D. Smucker, James Allan, Ben Carterette

Comments (0)

Sciweavers

A comparison of statistical significance tests for information retrieval evaluation

CIKM 2007 | Information Management | Sign Tests | Statistical Significance | Wilcoxon Signed Rank Test |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers