We describe the design and implementation of a high performance cloud that we have used to archive, analyze and mine large distributed data sets. By a cloud, we mean an infrastruc...
In this paper, we propose a robust model selection criterion for mixtures of subspaces called minimum effective dimension (MED). Previous information-theoretic model selection cri...
Different algorithms have been proposed in the literature to cluster gene expression data, however there is no single algorithm that can be considered the best one independently on...
In this paper, we propose the "Democratic Classifier", a simple, democracy-inspired patternbased classification algorithm that uses very short patterns for classificatio...
Given a data matrix, the problem of finding dense/uniform sub-blocks in the matrix is becoming important in several applications. The problem is inherently combinatorial since th...