CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition

16 years 6 months ago

Download compgen.unc.edu

The problem of simultaneously clustering columns and rows (coclustering) arises in important applications, such as text data mining, microarray analysis, and recommendation system analysis. Compared with the classical clustering algorithms, co-clustering algorithms have been shown to be more effective in discovering hidden clustering structures in the data matrix. The complexity of previous co-clustering algorithms is usually O(m ? n), where m and n are the numbers of rows and columns in the data matrix respectively. This limits their applicability to data matrices involving a large number of columns and rows. Moreover, some huge datasets can not be entirely held in main memory during co-clustering which violates the assumption made by the previous algorithms. In this paper, we propose a general framework for fast co-clustering large datasets, CRD. By utilizing recently developed sampling-based matrix decomposition methods, CRD achieves an execution time linear in m and n. Also, CRD d...

Feng Pan, Xiang Zhang, Wei Wang 0010

Real-time Traffic

Classical Clustering Algorithms | Data Matrix | Database | Matrix Decomposition Methods | SIGMOD 2008 |

claim paper

Added	08 Dec 2009
Updated	08 Dec 2009
Type	Conference
Year	2008
Where	SIGMOD
Authors	Feng Pan, Xiang Zhang, Wei Wang 0010

Sciweavers

CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition

Classical Clustering Algorithms | Data Matrix | Database | Matrix Decomposition Methods | SIGMOD 2008 |

Explore & Download

Productivity Tools

Sciweavers