Sciweavers

PODS
2005
ACM

Space complexity of hierarchical heavy hitters in multi-dimensional data streams

14 years 5 months ago
Space complexity of hierarchical heavy hitters in multi-dimensional data streams
Heavy hitters, which are items occurring with frequency above a given threshold, are an important aggregation and summary tool when processing data streams or data warehouses. Hierarchical heavy hitters (HHHs) have been introduced as a natural generalization for hierarchical data domains, including multi-dimensional data. An item x in a hierarchy is called a φ-HHH if its frequency after discounting the frequencies of all its descendant hierarchical heavy hitters exceeds φn, where φ is a user-specified parameter and n is the size of the data set. Recently, single-pass schemes have been proposed for computing φ-HHHs using space roughly O( 1 φ log(φn)). The frequency estimates of these algorithms, however, hold only for the total frequencies of items, and not the discounted frequencies; this leads to false positives because the discounted frequency can be significantly smaller than the total frequency. This paper attempts to explain the difficulty of finding hierarchical heavy h...
John Hershberger, Nisheeth Shrivastava, Subhash Su
Added 26 Jun 2010
Updated 26 Jun 2010
Type Conference
Year 2005
Where PODS
Authors John Hershberger, Nisheeth Shrivastava, Subhash Suri, Csaba D. Tóth
Comments (0)