LIMBO: Scalable Clustering of Categorical Data

16 years 7 months ago

Download www.cs.toronto.edu

Abstract. Clustering is a problem of great practical importance in numerous applications. The problem of clustering becomes more challenging when the data is categorical, that is, when there is no inherent distance measure between data values. We introduce LIMBO, a scalable hierarchical categorical clustering algorithm that builds on the Information Bottleneck (IB) framework for quantifying the relevant information preserved when clustering. As a hierarchical algorithm, LIMBO has the advantage that it can produce clusterings of different sizes in a single execution. We use the IB framework to define a distance measure for categorical tuples and we also present a novel distance measure for categorical attribute values. We show how the LIMBO algorithm can be used to cluster both tuples and values. LIMBO handles large data sets by producing a memory bounded summary model for the data. We present an experimental evaluation of LIMBO, and we study how clustering quality compares to other cat...

Periklis Andritsos, Panayiotis Tsaparas, Ren&eacut

Real-time Traffic

Categorical Clustering Algorithm | Categorical Clustering Algorithms | Database | EDBT 2004 | LIMBO Algorithm |

claim paper

» CLOPE a fast and effective clustering algorithm for transactional data

» A Fast Clustering Algorithm to Cluster Very Large Categorical Data Sets in Data Mining

» SCLOPE An Algorithm for Clustering Data Streams of Categorical Attributes

» ROCK A Robust Clustering Algorithm for Categorical Attributes

» CACTUS Clustering Categorical Data Using Summaries

» PoClustering Lossless Clustering of Dissimilarity Data

» Integrative ParameterFree Clustering of Data with Mixed Type Attributes

» Efficiently clustering transactional data with weighted coverage density

Post Info
More Details (n/a)

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2004
Where	EDBT
Authors	Periklis Andritsos, Panayiotis Tsaparas, Renée J. Miller, Kenneth C. Sevcik

Comments (0)

Sciweavers

LIMBO: Scalable Clustering of Categorical Data

Categorical Clustering Algorithm | Categorical Clustering Algorithms | Database | EDBT 2004 | LIMBO Algorithm |

Explore & Download

Productivity Tools

Sciweavers