Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the door to a new evaluation approach. We use the ODP (Open Directory Project) taxonomy to find sets of pseudo-relevant documents via one of two assumptions: 1) taxonomy entries are relevant to a given query if their editor-entered titles exactly match the query, or 2) all entries in a leaf-level taxonomy category are relevant to a given query if the category title exactly matches the query. We compare and contrast these two methodologies by evaluating six web search engines on a sample from an America Online log of ten million web queries, using MRR measures for the first method and precision-based measures for the second. We show that this technique is stable with respect to the query set selected and correlated with a reasonably large manual evaluation. Categories and Subject Descriptors H.3.3 [Information St...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury