ABSTRACT: OCR is an error-prone process. It is time-consuming and expensive to manually proofread OCR results. The errors remaining in OCRed texts can cause serious problems in rea...
Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content catego...
Guangyu Zhu, Xiaodong Yu, Yi Li, David S. Doermann
In recent years, many algorithms for the Web have been developed that work with information units distinct from individual web pages. These include segments of web pages or aggreg...
This paper presents a novel domain-independent text segmentation method, which identifies the boundaries of topic changes in long text documents and/or text streams. The method c...
Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not scanned at high resolution. Many current approaches rely on stored font models...
Andrew Kae, Gary Huang, Erik Learned-miller, Carl ...