We describe an evaluation of result set filtering techniques for providing ultra-high precision in the task of presenting related news for general web queries. In this task, the negative user experience generated by retrieving non-relevant documents has a much worse impact than not retrieving relevant ones. We adapt cost-based metrics from the document filtering domain to this result filtering problem in order to explicitly examine the tradeoff between missing relevant documents and retrieving non-relevant ones. A large manual evaluation of three simple threshold filters shows that the basic approach of counting matching title terms outperforms also ating selected abstract terms based on part-of-speech or higher-level linguistic structures. Simultaneously, leveraging these cost-based metrics allows us to explicitly determine what other tasks would benefit from these alternative techniques. Categories and Subject Descriptors: H.3.5 [Information Storage and Retrieval]: Online Informatio...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury