The MapReduce programming model simplifies large-scale data processing on commodity clusters by having users specify a map function that processes input key/value pairs to generate...
The k-means algorithm with cosine similarity, also known as the spherical k-means algorithm, is a popular method for clustering document collections. However, spherical k-means ca...
Spectral clustering is a widely used method for organizing data that only relies on pairwise similarity measurements. This makes its application to non-vectorial data straightforw...
Fabian L. Wauthier, Nebojsa Jojic, Michael I. Jord...
We present an efficient, fully automated algorithm to assemble ESTs into full-length cDNA sequences that represent the complete coding regions of a gene. Our EST clustering algori...
Arthur Grossman, Charles Hauser, Hilary J. Holz, J...
We present an algorithm for unsupervised induction of labeled parse trees. The algorithm has three stages: bracketing, initial labeling, and label clustering. Bracketing is done f...