We describe a new algorithm for protein classi cation and the detection of remote homologs. The rationale is to exploit both vertical and horizontal information of a multiple alig...
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
The performance of document clustering systems depends on employing optimal text representations, which are not only difficult to determine beforehand, but also may vary from one ...
Text summarization is a data reduction process. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core inform...
Lawrence H. Reeve, Hyoil Han, Saya V. Nagori, Jona...
Background: One step in the model organism database curation process is to find, for each article, the identifier of every gene discussed in the article. We consider a relaxation ...