Information retrieval system evaluation is complicated by the need for manually assessed relevance judgments. Large manually-built directories on the web open the door to new evaluation procedures. By assuming that web pages are the known relevant items for queries that exactly match their title, we use the ODP (Open Directory Project) and Looksmart directories for system evaluation. We test our approach with a sample from a log of ten million web queries and show that such an evaluation is unbiased in terms of the directory used, stable with respect to the query set selected, and correlated with a reasonably large manual evaluation. Categories and Subject Descriptors: H.3.4 [Information Storage and Retrieval]: Online Information Services – Web-based services General Terms: Experimentation, Measurement
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury