Histograms revisited: when are histograms the best approximation method for aggregates over joins?

16 years 6 months ago

Download www.cise.ufl.edu

The traditional statistical assumption for interpreting histograms and justifying approximate query processing methods based on them is that all elements in a bucket have the same frequency ? the so called uniform distribution assumption. In this paper we show that a significantly less restrictive statistical assumption ? the elements within a bucket are randomly arranged even though they might have different frequencies ? leads to identical formulae for approximating aggregate queries using histograms. This observation allows us to identify scenarios in which histograms are well suited as approximation methods ? in fact we show that in these situations sampling and sketching are significantly worse ? and provide tight error guarantees for the quality of approximations. At the same time we show that, on average, histograms are rather poor approximators outside these scenarios.

Alin Dobra

Real-time Traffic

Database | PODS 2005 | Restrictive Statistical Assumption | Traditional Statistical Assumption | Uniform Distribution Assumption |

claim paper

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2005
Where	PODS
Authors	Alin Dobra

Sciweavers

Histograms revisited: when are histograms the best approximation method for aggregates over joins?

Database | PODS 2005 | Restrictive Statistical Assumption | Traditional Statistical Assumption | Uniform Distribution Assumption |

Explore & Download

Productivity Tools

Sciweavers