Sciweavers

SIGMOD
2008
ACM

CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition

15 years 17 days ago
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition
The problem of simultaneously clustering columns and rows (coclustering) arises in important applications, such as text data mining, microarray analysis, and recommendation system analysis. Compared with the classical clustering algorithms, co-clustering algorithms have been shown to be more effective in discovering hidden clustering structures in the data matrix. The complexity of previous co-clustering algorithms is usually O(m ? n), where m and n are the numbers of rows and columns in the data matrix respectively. This limits their applicability to data matrices involving a large number of columns and rows. Moreover, some huge datasets can not be entirely held in main memory during co-clustering which violates the assumption made by the previous algorithms. In this paper, we propose a general framework for fast co-clustering large datasets, CRD. By utilizing recently developed sampling-based matrix decomposition methods, CRD achieves an execution time linear in m and n. Also, CRD d...
Feng Pan, Xiang Zhang, Wei Wang 0010
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2008
Where SIGMOD
Authors Feng Pan, Xiang Zhang, Wei Wang 0010
Comments (0)