Sciweavers

ADC
2007
Springer

Distributed Text Retrieval From Overlapping Collections

14 years 5 months ago
Distributed Text Retrieval From Overlapping Collections
In standard text retrieval systems, the documents are gathered and indexed on a single server. In distributed information retrieval (DIR), the documents are held in multiple collections; answers to queries are produced by selecting the collections to query and then merging results from these collections. However, in most prior research in the area, collections are assumed to be disjoint. In this paper, we investigate the effectiveness of different combinations of server selection and result merging algorithms in the presence of duplicates. We also test our hash-based method for efficiently detecting duplicates and near-duplicates in the lists of documents returned by collections. Our results, based on two different designs of test data, indicate that some DIR methods are more likely to return duplicate documents, and show that removing such redundant documents can have a significant impact on the final search effectiveness.
Milad Shokouhi, Justin Zobel, Yaniv Bernstein
Added 06 Jun 2010
Updated 06 Jun 2010
Type Conference
Year 2007
Where ADC
Authors Milad Shokouhi, Justin Zobel, Yaniv Bernstein
Comments (0)