Sciweavers

KDD
1999
ACM

CACTUS - Clustering Categorical Data Using Summaries

14 years 3 months ago
CACTUS - Clustering Categorical Data Using Summaries
Clustering is an important data mining problem. Most of the earlier work on clustering focussed on numeric attributes which have a natural ordering on their attribute values. Recently, clustering data with categorical attributes, whose attribute values do not have a natural ordering, has received some attention. However, previous algorithms do not give a formal description of the clusters they discover and some of them assume that the user post-processes the output of the algorithm to identify the final clusters. In this paper, we introduce a novel formalization of a cluster for categorical attributes by generalizing a definition of a cluster for numerical attributes. We then describe a very fast summarizationbased algorithm called CACTUS that discovers exactly such clusters in the data. CACTUS has two important characteristics. First, the algorithm requires only two scans of the dataset, and hence is very fast and scalable. Our experiments on a variety of datasets show that CACTUS ...
Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishn
Added 04 Aug 2010
Updated 04 Aug 2010
Type Conference
Year 1999
Where KDD
Authors Venkatesh Ganti, Johannes Gehrke, Raghu Ramakrishnan
Comments (0)