Information-theoretic co-clustering

16 years 7 months ago

Download www.cs.utexas.edu

Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory -- the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity ...

Inderjit S. Dhillon, Subramanyam Mallela, Dharmend

Real-time Traffic

Contingency Table Analysis | Data Mining | KDD 2003 | Simultaneous Clustering | Two-dimensional Contingency |

claim paper

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2003
Where	KDD
Authors	Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha

Comments (0)

Sciweavers

Information-theoretic co-clustering

Contingency Table Analysis | Data Mining | KDD 2003 | Simultaneous Clustering | Two-dimensional Contingency |

Explore & Download

Productivity Tools

Sciweavers