Sciweavers

CIKM
2004
Springer

Approximating the top-m passages in a parallel question answering system

14 years 4 months ago
Approximating the top-m passages in a parallel question answering system
We examine the problem of retrieving the top-m ranked items from a large collection, randomly distributed across an n-node system. In order to retrieve the top m overall, we must retrieve the top m from the subcollection stored on each node and merge the results. However, if we are willing to accept a small probability that one or more of the top-m items may be missed, it is possible to reduce computation time by retrieving only the top k < m from each node. In this paper, we demonstrate that this simple observation can be exploited in a realistic application to produce a substantial efficiency improvement without compromising the quality of the retrieved results. To support our claim, we present a statistical model that predicts the impact of the optimization. The paper is structured around a specific application — passage retrieval for question answering — but the primary results are more broadly applicable. Categories and Subject Descriptors H.3.4 [Information Systems]: Inf...
Charles L. A. Clarke, Egidio L. Terra
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where CIKM
Authors Charles L. A. Clarke, Egidio L. Terra
Comments (0)