Histograms reloaded: the merits of bucket diversity

15 years 5 months ago

Download homepages.cwi.nl

Virtually all histograms store for each bucket the number of distinct values it contains and their average frequency. In this paper, we question this paradigm. We start out by investigating the estimation precision of three commercial database systems which also follow the above paradigm. It turns out that huge errors are quite common. We then introduce new bucket types and investigate their accuracy when building optimal histograms with them. The results are ambiguous. There is no clear winner among the bucket types. At this point, we (1) switch to heterogeneous histograms, where diﬀerent buckets of the same histogram possibly are of diﬀerent types, and (2) design more bucket types. The nice consequence of introducing heterogeneous histograms is that we can guarantee decent upper error bounds while at the same time heterogeneous histograms require far less space than homogeneous histograms. Categories and Subject Descriptors H.2.4 [Database Management]: Systems—Query process

Carl-Christian Kanne, Guido Moerkotte

Real-time Traffic

Bucket Types | Database | Heterogeneous Histograms | Histograms | SIGMOD 2010 |

claim paper

Post Info
More Details (n/a)

Added	30 Jan 2011
Updated	30 Jan 2011
Type	Journal
Year	2010
Where	SIGMOD
Authors	Carl-Christian Kanne, Guido Moerkotte

Comments (0)

Sciweavers

Histograms reloaded: the merits of bucket diversity

Bucket Types | Database | Heterogeneous Histograms | Histograms | SIGMOD 2010 |

Explore & Download

Productivity Tools

Sciweavers