This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Background: Existing tools for multiple-sequence alignment focus on aligning protein sequence or protein-coding DNA sequence, and are often based on extensions to Needleman-Wunsch...
Sponsored search is a multi-billion dollar business that generates most of the revenue for search engines. Predicting the probability that users click on ads is crucial to sponsor...
Background: It necessary to use highly accurate and statistics-based systems for viral and phage genome annotations. The GeneMark systems for gene-finding in virus and phage genom...
Learning from imbalanced datasets presents a convoluted problem both from the modeling and cost standpoints. In particular, when a class is of great interest but occurs relatively...
Nitesh V. Chawla, David A. Cieslak, Lawrence O. Ha...