Sciweavers

VLDB
2007
ACM

A Simple and Efficient Estimation Method for Stream Expression Cardinalities

14 years 11 months ago
A Simple and Efficient Estimation Method for Stream Expression Cardinalities
Estimating the cardinality (i.e. number of distinct elements) of an arbitrary set expression defined over multiple distributed streams is one of the most fundamental queries of interest. Earlier methods based on probabilistic sketches have focused mostly on the sketching algorithms. However, the estimators do not fully utilize the information in the sketches and thus are not statistically efficient. In this paper, we develop a novel statistical model and an efficient yet simple estimator for the cardinalities based on a continuous variant of the well known Flajolet-Martin sketches. Specifically, we show that, for two streams, our estimator has almost the same statistical efficiency as the Maximum Likelihood Estimator (MLE), which is known to be optimal in the sense of Cramer-Rao lower bounds under regular conditions. Moreover, as the number of streams gets larger, our estimator is still computationally simple, but the MLE becomes intractable due to the complexity of the likelihood. Le...
Aiyou Chen, Jin Cao, Tian Bu
Added 05 Dec 2009
Updated 05 Dec 2009
Type Conference
Year 2007
Where VLDB
Authors Aiyou Chen, Jin Cao, Tian Bu
Comments (0)