A framework for determining necessary query set sizes to evaluate web search effectiveness

15 years 29 days ago

Download ir.iit.edu

We describe a framework of bootstrapped hypothesis testing for estimating the confidence in one web search engine outperforming another over any randomly sampled query set of a given size. To validate this framework, we have constructed and made available a precision-oriented test collection consisting of manual binary relevance judgments for each of the top ten results of ten web search engines across 896 queries and the single best result for each of those queries. Results from this bootstrapping approach over typical query set sizes indicate that examining repeated statistical tests is imperative, as a single test is quite likely to find significant differences that do not necessarily generalize. We also find that the number of queries needed for a repeatable evaluation in a dynamic environment such as the web is much higher than previously studied. Categories and Subject Descriptors H.3.5 [Information Storage and Retrieval]: Online Information Services ? Web-based services General...

Eric C. Jensen, Steven M. Beitzel, Ophir Frieder,

Real-time Traffic

Binary Relevance Judgments | Internet Technology | Query Set Sizes | Web Search Engine | WWW 2005 |

claim paper

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2005
Where	WWW
Authors	Eric C. Jensen, Steven M. Beitzel, Ophir Frieder, Abdur Chowdhury

Comments (0)

Sciweavers

A framework for determining necessary query set sizes to evaluate web search effectiveness

Binary Relevance Judgments | Internet Technology | Query Set Sizes | Web Search Engine | WWW 2005 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers