Tight results for clustering and summarizing data streams

15 years 1 months ago

Download www.cis.upenn.edu

In this paper we investigate algorithms and lower bounds for summarization problems over a single pass data stream. In particular we focus on histogram construction and K-center clustering. We provide a simple framework that improves upon all previous algorithms on these problems in either the space bound, the approximation factor or the running time. The framework uses a notion of "streamstrapping" where summaries created for the initial prefixes of the data are used to develop better approximation algorithms. We also prove the first non-trivial lower bounds for these problems. We show that the stricter requirement that if an algorithm accurately approximates the error of every bucket or every cluster produced by it, then these upper bounds are almost the best possible. This property of accurate estimation is true of all known upper bounds on these problems.

Sudipto Guha

Real-time Traffic

Better Approximation Algorithms | Database | ICDT 2009 | Non-trivial Lower Bounds | Upper Bounds |

claim paper

Post Info
More Details (n/a)

Added	21 Nov 2009
Updated	21 Nov 2009
Type	Conference
Year	2009
Where	ICDT
Authors	Sudipto Guha

Comments (0)

Sciweavers

Tight results for clustering and summarizing data streams

Better Approximation Algorithms | Database | ICDT 2009 | Non-trivial Lower Bounds | Upper Bounds |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers