The Expectation Maximization EM algorithm is an iterative procedure for maximum likelihood parameter estimation from data sets with missing or hidden variables 2 . It has been app...
Approximate Nearest Neighbor (ANN) methods such as Locality Sensitive Hashing, Semantic Hashing, and Spectral Hashing, provide computationally ecient procedures for nding objects...
Identifying the patterns of large data sets is a key requirement in data mining. A powerful technique for this purpose is the principal component analysis (PCA). PCA-based clusteri...
Background: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA ...
Jacques Rougemont, Arnaud Amzallag, Christian Isel...
Distributed data mining deals with the problem of data analysis in environments with distributed data, computing nodes, and users. Peer-to-peer computing is emerging as a new dist...
Souptik Datta, Kanishka Bhaduri, Chris Giannella, ...