Sampling-Based Estimation of the Number of Distinct Values of an Attribute

15 years 10 months ago

Download www.vldb.org

We provide several new sampling-based estimators of the number of distinct values of an attribute in a relation. We compare these new estimators to estimators from the database and statistical literature empirically, using a large number of attribute-value distributions drawn from a variety of real-world databases. This appears to be the first extensive comparison of distinct-value estimators in either the database or statistical literature, and is certainly the first to use highlyskewed data of the sort frequently encountered in database applications. Our experiments indicate that a new “hybrid” estimator yields the highest precision on average for a given sampling fraction. This estimator explicitly takes into account the degree of skew in the data and combines a new “smoothed jackknife” estimator with an estimator due to Shlosser. We investigate how the hybrid estimator behaves as we scale up the size of the database.

Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, L

Real-time Traffic

Database | Distinct-value Estimators | Estimator | Statistical Literature | VLDB 1995 |

claim paper

» Distinct Sampling for HighlyAccurate Answers to Distinct Values Queries and Event Reports

» Finite time bounds for sampling based fitted value iteration

» A Bayesian Approach to Estimating the Selectivity of Conjunctive Predicates

» Strings with Maximally Many Distinct Subsequences and Substrings

» Selectivity estimators for multidimensional range queries over real attributes

» The averagecase complexity of counting distinct elements

» Segmentation of Distinct Homogeneous Color Regions in Images

» On Modeling Profiles Instead of Values

Post Info
More Details (n/a)

Added	26 Aug 2010
Updated	26 Aug 2010
Type	Conference
Year	1995
Where	VLDB
Authors	Peter J. Haas, Jeffrey F. Naughton, S. Seshadri, Lynne Stokes

Comments (0)

Sciweavers

Sampling-Based Estimation of the Number of Distinct Values of an Attribute

Database | Distinct-value Estimators | Estimator | Statistical Literature | VLDB 1995 |

Explore & Download

Productivity Tools

Sciweavers