Sciweavers

CN
2008

The eternal sunshine of the sketch data structure

13 years 11 months ago
The eternal sunshine of the sketch data structure
In the past years there has been significant research on developing compact data structures for summarizing large data streams. A family of such data structures is the so-called sketches. Sketches bear similarities to the well-known Bloom filters [2] and employ hashing techniques to approximate the count associated with an arbitrary key in a data stream using fixed memory resources. One limitation of sketches is that when used for summarizing long data streams, they gradually saturate, resulting in a potentially large error on estimated key counts. In this work, we introduce two techniques to address this problem based on the observation that real-world data streams often have many transient keys that appear for short time periods and do not re-appear later on. After entering the data structure, these keys contribute to hashing collisions and thus reduce the estimation accuracy of sketches. Our techniques use a limited amount of additional memory to detect transient keys and to period...
Xenofontas A. Dimitropoulos, Marc Ph. Stoecklin, P
Added 09 Dec 2010
Updated 09 Dec 2010
Type Journal
Year 2008
Where CN
Authors Xenofontas A. Dimitropoulos, Marc Ph. Stoecklin, Paul Hurley, Andreas Kind
Comments (0)