The K-means clustering problem seeks to partition the columns of a data matrix in subsets, such that columns in the same subset are ‘close’ to each other. The co-clustering problem seeks to simultaneously partition the rows and columns of a matrix to produce ‘coherent’ groups called co-clusters. Co-clustering has recently found numerous applications in diverse areas. The concept readily generalizes to higher-way data sets (e.g., adding a temporal dimension). Starting from K-means, we show how co-clustering can be formulated as constrained multilinear decomposition with sparse latent factors. In the case of three- and higher-way data, this corresponds to a PARAFAC decomposition with sparse latent factors. This is important, for PARAFAC is unique under mild conditions - and sparsity further improves identifiability. This allows us to uniquely unravel a large number of possibly overlapping co-clusters that are hidden in the data. Interestingly, the imposition of latent sparsity ...
Evangelos E. Papalexakis, Nicholas D. Sidiropoulos