Recent years have seen growing interest in effective algorithms for summarizing and querying massive, high-speed data streams. Randomized sketch synopses provide accurate approximations for general-purpose summaries of the streaming data distribution (e.g., wavelets). The focus of existing work has typically been on minimizing space requirements of the maintained synopsis -- however, to effectively support high-speed data-stream analysis, a crucial practical requirement is to also optimize: (1) the update time for incorporating a streaming data element in the sketch, and (2) the query time for producing an approximate summary (e.g., the top wavelet coefficients) from the sketch. Such time costs must be small enough to cope with rapid stream-arrival rates and the realtime querying requirements of typical streaming applications (e.g., ISP network monitoring). With cheap and plentiful memory, space is often only a secondary concern after query/update time costs. In this paper, we propose ...
Graham Cormode, Minos N. Garofalakis, Dimitris Sac