Sciweavers

ICDE
2012
IEEE

A General Method for Estimating Correlated Aggregates over a Data Stream

12 years 2 months ago
A General Method for Estimating Correlated Aggregates over a Data Stream
—On a stream of two dimensional data items (x, y) where x is an item identifier, and y is a numerical attribute, a correlated aggregate query requires us to first apply a selection predicate along the second (y) dimension, followed by an aggregation along the first (x) dimension. For selection predicates of the form (y < c) or (y > c), where parameter c is provided at query time, we present new streaming algorithms and lower bounds for estimating statistics of the resulting substream of elements that satisfy the predicate. We provide the first sublinear space algorithms for a large family of statistics in this model, including frequency moments. We experimentally validate our algorithms, showing that their memory requirements are significantly smaller than existing linear storage schemes for large datasets, while simultaneously achieving fast per-record processing time. We also study the problem when the items have weights. Allowing negative weights allows for analyzing va...
Srikanta Tirthapura, David P. Woodruff
Added 28 Sep 2012
Updated 28 Sep 2012
Type Journal
Year 2012
Where ICDE
Authors Srikanta Tirthapura, David P. Woodruff
Comments (0)