Sciweavers

EDBT
2010
ACM

Turbo-charging hidden database samplers with overflowing queries and skew reduction

14 years 3 months ago
Turbo-charging hidden database samplers with overflowing queries and skew reduction
Recently, there has been growing interest in random sampling from online hidden databases. These databases reside behind form-like web interfaces which allow users to execute search queries by specifying the desired values for certain attributes, and the system responds by returning a few (e.g., top-k) tuples that satisfy the selection conditions, sorted by a suitable scoring function. In this paper, we consider the problem of uniform random sampling over such hidden databases. A key challenge is to eliminate the skew of samples incurred by the selective return of highly ranked tuples. To address this challenge, all state-of-the-art samplers share a common approach: they do not use overflowing queries. This is done in order to avoid favoring highly ranked tuples and thus incurring high skew in the retrieved samples. However, not considering overflowing queries substantially impacts sampling efficiency. In this paper, we propose novel sampling techniques which do leverage overflowing q...
Arjun Dasgupta, Nan Zhang 0004, Gautam Das
Added 02 Sep 2010
Updated 02 Sep 2010
Type Conference
Year 2010
Where EDBT
Authors Arjun Dasgupta, Nan Zhang 0004, Gautam Das
Comments (0)