Tracking Join and Self-Join Sizes in Limited Storage

14 years 5 months ago

Download theory.stanford.edu

Query optimizers rely on fast, high-quality estimates of result sizes in order to select between various join plans. Selfjoin sizes of relations provide bounds on the join size of any pairs of such relations. It also indicates the degree of skew in the data, and has been advocated for several estimation procedures. Exact computation of the self-join size requires storage proportional to the number of distinct attribute values, which may be prohibitively large. In this paper, we study algorithms for tracking (approximate) self-join sizes in limited storage in the presence of insertions and deletions to the relations. Such algorithms detect changes in the degree of skew without an expensive recomputation from the base data. We show that an algorithm based on a tug-ofwar approach provides a more accurate estimation than one based on a sample-and-countapproach which is in turn more accurate than a sampling-only approach. Next, we study algorithms for tracking (approximate) join sizes in l...

Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario

Real-time Traffic

Database | Join Size | PODS 1999 | Relations Provide Bounds | Self-join Sizes |

claim paper

Post Info
More Details (n/a)

Added	03 Aug 2010
Updated	03 Aug 2010
Type	Conference
Year	1999
Where	PODS
Authors	Noga Alon, Phillip B. Gibbons, Yossi Matias, Mario Szegedy

Comments (0)

Sciweavers

Tracking Join and Self-Join Sizes in Limited Storage

Database | Join Size | PODS 1999 | Relations Provide Bounds | Self-join Sizes |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers