Approximating the top-m passages in a parallel question answering system

14 years 8 months ago

Download plg.uwaterloo.ca

We examine the problem of retrieving the top-m ranked items from a large collection, randomly distributed across an n-node system. In order to retrieve the top m overall, we must retrieve the top m from the subcollection stored on each node and merge the results. However, if we are willing to accept a small probability that one or more of the top-m items may be missed, it is possible to reduce computation time by retrieving only the top k < m from each node. In this paper, we demonstrate that this simple observation can be exploited in a realistic application to produce a substantial eﬃciency improvement without compromising the quality of the retrieved results. To support our claim, we present a statistical model that predicts the impact of the optimization. The paper is structured around a speciﬁc application — passage retrieval for question answering — but the primary results are more broadly applicable. Categories and Subject Descriptors H.3.4 [Information Systems]: Inf...

Charles L. A. Clarke, Egidio L. Terra

Real-time Traffic

CIKM 2004 | Question Answering | Substantial Eﬃciency Improvement | Top-m Ranked Items |

claim paper

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	CIKM
Authors	Charles L. A. Clarke, Egidio L. Terra

Comments (0)

Sciweavers

Approximating the top-m passages in a parallel question answering system

CIKM 2004 | Question Answering | Substantial Eﬃciency Improvement | Top-m Ranked Items |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers