Sciweavers

SIGIR
2006
ACM

Load balancing for term-distributed parallel retrieval

14 years 6 months ago
Load balancing for term-distributed parallel retrieval
Large-scale web and text retrieval systems deal with amounts of data that greatly exceed the capacity of any single machine. To handle the necessary data volumes and query throughput rates, parallel systems are used, in which the document and index data are split across tightly-clustered distributed computing systems. The index data can be distributed either by document or by term. In this paper we examine methods for load balancing in term-distributed parallel architectures, and propose a suite of techniques for reducing net querying costs. In combination, the techniques we describe allow a 30% improvement in query throughput when tested on an eight-node parallel computer system. Categories and Subject Descriptors H.3.1 [Information Storage and Retrieval]: Content analysis and indexing – indexing methods; H.3.2 [Information Storage and Retrieval]: Information storage – file organization; H.3.3 [Information Storage and Retrieval]: Information search and retrieval – search proce...
Alistair Moffat, William Webber, Justin Zobel
Added 14 Jun 2010
Updated 14 Jun 2010
Type Conference
Year 2006
Where SIGIR
Authors Alistair Moffat, William Webber, Justin Zobel
Comments (0)