Sciweavers

ICDE
2008
IEEE

A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition

15 years 2 months ago
A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition
Abstract-- Simultaneously clustering columns and rows (coclustering) of large data matrix is an important problem with wide applications, such as document mining, microarray analysis, and recommendation systems. Several co-clustering algorithms have been shown effective in discovering hidden clustering structures in the data matrix. For a data matrix of m rows and n columns, the time complexity of these methods is usually in the order of m ? n (if not higher). This limits their applicability to data matrices involving a large number of columns and rows. Moreover, an implicit assumption made by existing co-clustering methods is that the whole data matrix needs to be held in the main memory. In this paper, we propose a general framework, CRD, for co-clustering large datasets utilizing recently developed samplingbased matrix decomposition methods. The time complexity of our approach is linear in m and n. And it does not require the whole data matrix be in the main memory. Extensive experi...
Feng Pan, Xiang Zhang, Wei Wang 0010
Added 01 Nov 2009
Updated 01 Nov 2009
Type Conference
Year 2008
Where ICDE
Authors Feng Pan, Xiang Zhang, Wei Wang 0010
Comments (0)