On Random Sampling over Joins

15 years 11 months ago

Download research.microsoft.com

A major bottleneck in implementing sampling as a primitive relational operation is the ine ciency ofsampling the output of a query. It is not even known whether it is possible to generate a sample of a join tree without rst evaluating the join tree completely. We undertake a detailed study of this problem and attempt to analyze it in a variety of settings. We present theoretical results explaining the di culty of this problem and setting limits on the e ciency that can be achieved. Based on new insights into the interaction between join and sampling, we develop join sampling techniques for the settings where our negative results do not apply. Our new sampling algorithms are signi cantly more e cient than those known earlier. We present experimental evaluation of our techniques on Microsoft's SQL Server 7.0.

Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasa

Real-time Traffic

Database | Join Sampling Techniques | Join Tree | Primitive Relational Operation | SIGMOD 1999 |

claim paper

» Histograms revisited when are histograms the best approximation method for aggregates over...

» Linked Bernoulli Synopses Sampling along Foreign Keys

» MemoryLimited Execution of Windowed Stream Joins

» Load Shedding for Window Joins on Multiple Data Streams

» Endbiased Samples for Join Cardinality Estimation

» Descriptive Sampling An Improvement over Latin Hypercube Sampling

» Tracking Join and SelfJoin Sizes in Limited Storage

» Join Reordering by Join Simulation

Post Info
More Details (n/a)

Added	03 Aug 2010
Updated	03 Aug 2010
Type	Conference
Year	1999
Where	SIGMOD
Authors	Surajit Chaudhuri, Rajeev Motwani, Vivek R. Narasayya

Comments (0)

Sciweavers

On Random Sampling over Joins

Database | Join Sampling Techniques | Join Tree | Primitive Relational Operation | SIGMOD 1999 |

Explore & Download

Productivity Tools

Sciweavers