In this paper we describe an Information Retrieval problem called collection fusion. The collection fusion problem is to maximize the number of relevant natural language documents retrieved given: a natural language query, multiple collections of documents, and a fixed total number of documents to retrieve. We describe two algorithms that use past queries to learn collection fusion strategies. Tests of these algorithms on a corpus of 742,000 documents indicate that they can learn good fusion strategies. Moreover, the strategies learned by our methods are consistently superior to those learned by a standard learning algorithm.
Geoffrey G. Towell, Ellen M. Voorhees, Narendra Ku