We present a novel approach to managing redundancy in sequence databanks such as GenBank. We store clusters of near-identical sequences as a representative union-sequence and a se...
Michael Cameron, Yaniv Bernstein, Hugh E. Williams
Background: There are some limitations associated with conventional clustering methods for short time-course gene expression data. The current algorithms require prior domain know...
Tags have recently become popular as a means of annotating and organizing Web pages and blog entries. Advocates of tagging argue that the use of tags produces a 'folksonomy...
We present a freely available benchmark dataset for audio classification and clustering. This dataset consists of 10 seconds samples of 1886 songs obtained from the Garageband si...
Background: Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on infor...