Data deduplication has become a popular technology for reducing the amount of storage space necessary for backup and archival data. Content defined chunking (CDC) techniques are w...
The availability of large on-line text corpora provides a natural and promising bridge between the worlds of natural language processing (NLP) and machine learning (ML). In recent...
The design and implementation of a search engine for lecture webcasts is described. A searchable text index is created allowing users to locate material within lecture videos foun...
John Adcock, Matthew Cooper, Laurent Denoue, Hamed...
The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The mo...
Chandan K. Reddy, Yao-Chung Weng, Hsiao-Dong Chian...
Background: Cluster analysis has been widely applied for investigating structure in bio-molecular data. A drawback of most clustering algorithms is that they cannot automatically ...