A distributed information retrieval system with resourceselection and resultset merging capability was used to search subsets of the GOV2 document corpus for the 2008 TREC Million Query Track. The GOV2 collection was partitioned into hostname subcollections and distributed to multiple remote machines. The Multisearch demonstration application restricted each search to a fraction of the available subcollections that was predetermined by a resourceselection algorithm. Experiment results from topicbytopic resource selection and aggregate topic resource selection are compared. The sensitivity of Multisearch retrieval performance to variations in the resource selection algorithm is discussed.
Christopher T. Fallen, Gregory B. Newby, Kylie McC