SUSHI: scoring scaled samples for server selection

16 years 1 months ago

Download es.csiro.au

Modern techniques for distributed information retrieval use a set of documents sampled from each server, but these samples have been underutilised in server selection. We describe a new server selection algorithm, SUSHI, which unlike earlier algorithms can make full use of the text of each sampled document and which does not need training data. SUSHI can directly optimise for many common cases, including high precision retrieval, and by including a simple stopping condition can do so while reducing network traﬃc. Our experiments compare SUSHI with alternatives and show it achieves the same eﬀectiveness as the best current methods while being substantially more eﬃcient, selecting as few as 20% as many servers. Categories and Subject Descriptors H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval—selection process; H.3.4 [Information Storage and Retrieval]: Systems and Software—distributed systems General Terms Experimentation, Measurement Keywords Docu...

Paul Thomas, Milad Shokouhi

Real-time Traffic