Sciweavers

ICDE
1999
IEEE

ROCK: A Robust Clustering Algorithm for Categorical Attributes

15 years 1 months ago
ROCK: A Robust Clustering Algorithm for Categorical Attributes
Clustering, in data mining, is useful to discover distribution patterns in the underlying data. Clustering algorithms usually employ a distance metric based (e.g., euclidean) similarity measure in order to partition the database such that data points in the same partition are more similar than points in different partitions. In this paper, we study clustering algorithms for data with boolean and categorical attributes. We show that traditional clustering algorithms that use distances between points for clustering are not appropriate for boolean and categorical attributes. Instead, we propose a novel concept of links to measure the similarity/proximity between a pair of data points. We develop a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters. Our methods naturally extend to non-metric similarity measures that are relevant in situations where a domain expert/similarity table is the only source of knowledge. In addition to present...
Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 1999
Where ICDE
Authors Sudipto Guha, Rajeev Rastogi, Kyuseok Shim
Comments (0)