Approximate Clustering on Distributed Data Streams

15 years 1 months ago

Download gamma.cs.unc.edu

Abstract-- We investigate the problem of clustering on distributed data streams. In particular, we consider the k-median clustering on stream data arriving at distributed sites which communicate through a routing tree. Distributed clustering on high speed data streams is a challenging task due to limited communication capacity, storage space, and computing power at each site. In this paper, we propose a suite of algorithms for computing (1 + )-approximate k-median clustering over distributed data streams under three different topology settings: topologyoblivious, height-aware, and path-aware. Our algorithms reduce the maximum per node transmission to polylog N (opposed to (N) for transmitting the raw data). We have simulated our algorithms on a distributed stream system with both real and synthetic datasets composed of millions of data. In practice, our algorithms are able to reduce the data transmission to a small fraction of the original data. Moreover, our results indicate that the ...

Qi Zhang, Jinze Liu, Wei Wang 0010

Real-time Traffic

Data Streams | Database | ICDE 2008 | Speed Data Streams | Stream Data |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2009
Updated	01 Nov 2009
Type	Conference
Year	2008
Where	ICDE
Authors	Qi Zhang, Jinze Liu, Wei Wang 0010

Comments (0)

Sciweavers

Approximate Clustering on Distributed Data Streams

Data Streams | Database | ICDE 2008 | Speed Data Streams | Stream Data |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers