Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering

16 years 3 months ago

Download www.cs.ualberta.ca

In this paper, we investigate how to scale hierarchical clustering methods (such as OPTICS) to extremely large databases by utilizing data compression methods (such as BIRCH or random sampling). We propose a three step procedure: 1) compress the data into suitable representative objects; 2) apply the hierarchical clustering algorithm only to these objects; 3) recover the clustering structure for the whole data set, based on the result for the compressed data. The key issue in this approach is to design compressed data items such that not only a hierarchical clustering algorithm can be applied, but also that they contain enough information to infer the clustering structure of the original data set in the third step. This is crucial because the results of hierarchical clustering algorithms, when applied naively to a random sample or to the clustering features (CFs) generated by BIRCH, deteriorate rapidly for higher compression rates. This is due to three key problems, which we identify....

Markus M. Breunig, Hans-Peter Kriegel, Peer Kr&oum

Real-time Traffic

Database | Hierarchical Clustering Algorithm | Hierarchical Clustering Algorithms | Hierarchical Clustering Methods | SIGMOD 2001 |

claim paper

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2001
Where	SIGMOD
Authors	Markus M. Breunig, Hans-Peter Kriegel, Peer Kröger, Jörg Sander

Sciweavers

Data Bubbles: Quality Preserving Performance Boosting for Hierarchical Clustering

Database | Hierarchical Clustering Algorithm | Hierarchical Clustering Algorithms | Hierarchical Clustering Methods | SIGMOD 2001 |

Explore & Download

Productivity Tools

Sciweavers