The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...
When trained and evaluated on accurately labeled datasets, online email spam filters are remarkably effective, achieving error rates an order of magnitude better than classifie...
— Ordinal regression is an important type of learning, which has properties of both classification and regression. Here we describe an effective approach to adapt a traditional ...
A major source of information (often the most crucial and informative part) in scholarly articles from scientific journals, proceedings and books are the figures that directly pro...
Amr Ahmed, Eric P. Xing, William W. Cohen, Robert ...
In many Web applications, such as blog classification and newsgroup classification, labeled data are in short supply. It often happens that obtaining labeled data in a new domain ...