We present a system that classifies pixels in a document image according to marking type such as machine print, handwriting, and noise. A segmenter module first splits an input ...
There is considerable interest in interdisciplinary combinations of automatic speech recognition (ASR), machine learning, natural language processing, text classification and info...
Mark Dredze, Aren Jansen, Glen Coppersmith, Ken Wa...
Perspective distortion always occurs while scanning thick, bound documents, resulting in two problems in the scanned grayscale image ? (i) shade along the `spine' of the book...
In this paper, we introduce the idea of Intent Analysis, which is to create a profile of the goals and intentions present in textual content. Intent Analysis, similar to Sentiment...
Typically, searching for information in a document collection amounts to refining a query and then scanning a large number of documents to determine their relevance. Active Summar...