A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition

15 years 2 months ago

Download www.cs.unc.edu

Abstract-- Simultaneously clustering columns and rows (coclustering) of large data matrix is an important problem with wide applications, such as document mining, microarray analysis, and recommendation systems. Several co-clustering algorithms have been shown effective in discovering hidden clustering structures in the data matrix. For a data matrix of m rows and n columns, the time complexity of these methods is usually in the order of m ? n (if not higher). This limits their applicability to data matrices involving a large number of columns and rows. Moreover, an implicit assumption made by existing co-clustering methods is that the whole data matrix needs to be held in the main memory. In this paper, we propose a general framework, CRD, for co-clustering large datasets utilizing recently developed samplingbased matrix decomposition methods. The time complexity of our approach is linear in m and n. And it does not require the whole data matrix be in the main memory. Extensive experi...

Feng Pan, Xiang Zhang, Wei Wang 0010

Real-time Traffic

Data Matrix | Database | ICDE 2008 | Large Data Matrix | Matrix Decomposition Methods |

claim paper

Post Info
More Details (n/a)

Added	01 Nov 2009
Updated	01 Nov 2009
Type	Conference
Year	2008
Where	ICDE
Authors	Feng Pan, Xiang Zhang, Wei Wang 0010

Comments (0)

Sciweavers

A General Framework for Fast Co-clustering on Large Datasets Using Matrix Decomposition

Data Matrix | Database | ICDE 2008 | Large Data Matrix | Matrix Decomposition Methods |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers