Sciweavers

VLDB
2002
ACM

Comparing Data Streams Using Hamming Norms (How to Zero In)

14 years 2 days ago
Comparing Data Streams Using Hamming Norms (How to Zero In)
Massive data streams are now fundamental to many data processing applications. For example, Internet routers produce large scale diagnostic data streams. Such streams are rarely stored in traditional databases, and instead must be processed "on the fly" as they are produced. Similarly, sensor networks produce multiple data streams of observations from their sensors. There is growing focus on manipulating data streams, and hence, there is a need to identify basic operations of interest in managing data streams, and to support them efficiently. We propose computation of the Hamming norm as a basic operation of interest. The Hamming norm formalises ideas that are used throughout data processing. When applied to a single stream, the Hamming norm gives the number of distinct items that are present in that data stream, which is a statistic of great interest in databases. When applied to a pair of streams, the Hamming norm gives an important measure of (dis)similarity: the number o...
Graham Cormode, Mayur Datar, Piotr Indyk, S. Muthu
Added 23 Dec 2010
Updated 23 Dec 2010
Type Journal
Year 2002
Where VLDB
Authors Graham Cormode, Mayur Datar, Piotr Indyk, S. Muthukrishnan
Comments (0)