Sciweavers

SIGMOD
1996
ACM

BIRCH: An Efficient Data Clustering Method for Very Large Databases

14 years 3 months ago
BIRCH: An Efficient Data Clustering Method for Very Large Databases
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely st,udied problems in this area is the identification of clusters, or deusel y populated regions, in a multi-dir nensional clataset. Prior work does not adequately address the problem of large datasets and minimization of 1/0 costs. This paper presents a data clustering method named Bfll (;"H (Balanced Iterative Reducing and Clustering using Hierarchies), and demonstrates that it is especially suitable for very large databases. BIRCH incrementally and clynamicall y clusters incoming multi-dimensional metric data points to try to produce the best quality clustering with the available resources (i. e., available memory and time constraints). BIRCH can typically find a goocl clustering with a single scan of the data, and improve the quality further with a few aclditioual scans. BIRCH is also the first clustering algorithm proposerl in the database area to handle "no...
Tian Zhang, Raghu Ramakrishnan, Miron Livny
Added 08 Aug 2010
Updated 08 Aug 2010
Type Conference
Year 1996
Where SIGMOD
Authors Tian Zhang, Raghu Ramakrishnan, Miron Livny
Comments (0)