Sciweavers

SIGMOD
2006
ACM

Approximately detecting duplicates for streaming data using stable bloom filters

15 years 18 days ago
Approximately detecting duplicates for streaming data using stable bloom filters
Traditional duplicate elimination techniques are not applicable to many data stream applications. In general, precisely eliminating duplicates in an unbounded data stream is not feasible in many streaming scenarios. Therefore, we target at approximately eliminating duplicates in streaming environments given a limited space. Based on a well-known bitmap sketch, we introduce a data structure, Stable Bloom Filter, and a novel and simple algorithm. The basic idea is as follows: since there is no way to store the whole history of the stream, SBF continuously evicts the stale information so that SBF has room for those more recent elements. After finding some properties of SBF analytically, we show that a tight upper bound of false positive rates is guaranteed. In our empirical study, we compare SBF to alternative methods. The results show that our method is superior in terms of both accuracy and time efficiency when a fixed small space and an acceptable false positive rate are given.
Fan Deng, Davood Rafiei
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2006
Where SIGMOD
Authors Fan Deng, Davood Rafiei
Comments (0)