Sciweavers

VLDB
2004
ACM

Top-k Query Evaluation with Probabilistic Guarantees

14 years 4 months ago
Top-k Query Evaluation with Probabilistic Guarantees
Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagin’s threshold algorithm (TA). Since the user’s goal behind top-k queries is to identify one or a few relevant and novel data items, it is intriguing to use approximate variants of TA to reduce run-time costs. This paper introduces a family of approximate top-k algorithms based on probabilistic arguments. When scanning index lists of the underlying multidimensional data space in descending order of local scores, various forms of convolution and derived bounds are employed to predict when it is safe, with high probability, to drop candidate items and to prune the index scans. The precision and the efficiency of the developed methods are experimentally evaluated based on a large Web corpus and a structured data collection.
Martin Theobald, Gerhard Weikum, Ralf Schenkel
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where VLDB
Authors Martin Theobald, Gerhard Weikum, Ralf Schenkel
Comments (0)