Given the ranked lists of documents returned by multiple search engines in response to a given query, the problem of metasearch is to combine these lists in a way which optimizes the performance of the combination. This problem can be naturally decomposed into three subproblems: (1) normalizing the relevance scores given by the input systems, (2) estimating relevance scores for unretrieved documents, and (3) combining the newly-acquired scores for each document into one, improved score. Research on the problem of metasearch has historically concentrated on algorithms for combining (normalized) scores. In this paper, we show that the techniques used for normalizing relevance scores and estimating the relevance scores of unretrieved documents can have a significant effect on the overall performance of metasearch. We propose two new normalization/estimation techniques and demonstrate empirically that the performance of well known metasearch algorithms can be significantly improved thr...
Mark H. Montague, Javed A. Aslam