We present a novel approach to managing redundancy in sequence databanks such as GenBank. We store clusters of near-identical sequences as a representative union-sequence and a se...
Michael Cameron, Yaniv Bernstein, Hugh E. Williams
Information-theoretic clustering aims to exploit information theoretic measures as the clustering criteria. A common practice on this topic is so-called INFO-K-means, which perfor...
As valuable software systems get old, reverse engineering becomes more and more important to the companies that have to maintain the code. Clustering is a key activity in reverse ...
The problem of finding clusters in data is challenging when clusters are of widely differing sizes, densities and shapes, and when the data contains large amounts of noise and out...
Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the ...
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua L...