We study the problem of estimating the Earth Mover’s Distance (EMD) between probability distributions when given access only to samples of the distributions. We give closeness t...
Khanh Do Ba, Huy L. Nguyen, Huy N. Nguyen, Ronitt ...
Abstract. Clustering algorithms based on a matrix of pairwise similarities (kernel matrix) for the data are widely known and used, a particularly popular class being spectral clust...
Background: Microarray technologies produced large amount of data. The hierarchical clustering is commonly used to identify clusters of co-expressed genes. However, microarray dat...
Alexandre G. de Brevern, Serge A. Hazout, Alain Ma...
Clustering is ill-defined. Unlike supervised learning where labels lead to crisp performance criteria such as accuracy and squared error, clustering quality depends on how the cl...
Rich Caruana, Mohamed Farid Elhawary, Nam Nguyen, ...
Data is often collected over a distributed network, but in many cases, is so voluminous that it is impractical and undesirable to collect it in a central location. Instead, we mus...