This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
The wealth of transcript information that has been made publicly available in recent years has led to large pools of individual web sites offering access to bioinformatics softwar...
Coral del Val, Peter Ernst, Mechthild Falkenhahn, ...
Many websites have large collections of pages generated dynamically from an underlying structured source like a database. The data of a category are typically encoded into similar...
Background: Improvements in DNA sequencing technology and methodology have led to the rapid expansion of databases comprising DNA sequence, gene and genome data. Lower operational...
The detection of repeated subsequences, time series motifs, is a problem which has been shown to have great utility for several higher-level data mining algorithms, including clas...