Sciweavers

CIKM
2007
Springer

Hypothesis testing with incomplete relevance judgments

14 years 3 months ago
Hypothesis testing with incomplete relevance judgments
Information retrieval experimentation generally proceeds in a cycle of development, evaluation, and hypothesis testing. Ideally, the evaluation and testing phases should be short and easy, so as to maximize the amount of time spent in development. There has been recent work on reducing the amount of assessor effort needed to evaluate retrieval systems, but it has not, for the most part, investigated the effects of these methods on tests of significance. In this work, we explore in detail the effects of reduced sets of judgments on the sign test. We demonstrate both analytically and empirically the relationship between the power of the test, the number of topics evaluated, and the number of judgments available. Using these relationships, we can determine the number of topics and judgments needed for the least-cost but highest-confidence significance evaluation. Specifically, testing pairwise significance over 192 topics with fewer than 5 judgments for each is as good as testing signifi...
Ben Carterette, Mark D. Smucker
Added 13 Aug 2010
Updated 13 Aug 2010
Type Conference
Year 2007
Where CIKM
Authors Ben Carterette, Mark D. Smucker
Comments (0)