We consider the problem of retrieving multiple documents relevant to the single subtopics of a given web query, termed “full-subtopic retrieval”. To solve this problem we pres...
Andrea Bernardini, Claudio Carpineto, Massimiliano...
Government agencies must often quickly organize and analyze large amounts of textual information, for example comments received as part of notice and comment rulemaking. Hierarchi...
Training a system to recognize handwritten words is a task that requires a large amount of data with their correct transcription. However, the creation of such a training set, inc...
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...
WISDOM++ is a document analysis system whose main design requirements are real-time user interaction and adaptivity. This paper presents the two-phased skew estimation algorithm a...