The direct application of standard ranking techniques to retrieve individual elements from a collection of XML documents often produces a result set in which the top ranks are dominated by a large number of elements taken from a small number of highly relevant documents. This paper presents and evaluates an algorithm that re-ranks this result set, with the aim of minimizing redundant content while preserving the benefits of element retrieval, including the benefit of identifying topic-focused components contained within relevant documents. The test collection developed by the INitiative for the Evaluation of XML Retrieval (INEX) forms the basis for the evaluation. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Storage and Retrieval—Information Search and Retrieval General Terms Algorithms, Measurement, Performance, Experimentation Keywords XML, Ranking, Information Retrieval
Charles L. A. Clarke