Sciweavers

KDD
2003
ACM

Information-theoretic co-clustering

14 years 12 months ago
Information-theoretic co-clustering
Two-dimensional contingency or co-occurrence tables arise frequently in important applications such as text, web-log and market-basket data analysis. A basic problem in contingency table analysis is co-clustering: simultaneous clustering of the rows and columns. A novel theoretical formulation views the contingency table as an empirical joint probability distribution of two discrete random variables and poses the co-clustering problem as an optimization problem in information theory -- the optimal co-clustering maximizes the mutual information between the clustered random variables subject to constraints on the number of row and column clusters. We present an innovative co-clustering algorithm that monotonically increases the preserved mutual information by intertwining both the row and column clusterings at all stages. Using the practical example of simultaneous word-document clustering, we demonstrate that our algorithm works well in practice, especially in the presence of sparsity ...
Inderjit S. Dhillon, Subramanyam Mallela, Dharmend
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2003
Where KDD
Authors Inderjit S. Dhillon, Subramanyam Mallela, Dharmendra S. Modha
Comments (0)