This paper describes a system for efficient indexing and retrieval of words in collections of document images. The proposed method is based on two main principles: unsupervised pr...
While scanning pages from a thick, bound book, there are two sources of distortion in the document images: 1) shade along the book `spine', and 2) warping of the book surface...
Recently, high resolution digital cameras have made the digitization process more flexible and convenient than traditional scanning technology. Therefore, document image analysis ...
A large annotated corpus is critical to the development of robust optical character recognizers (OCRs). However, creation of annotated corpora is a tedious task. It is laborious, ...
Our proposed approach to text and line-art extraction requires accurately locating a text-string box and identifying external line vectors incident on the box. The results of extra...
Luyang Li, George Nagy, Ashok Samal, Sharad C. Set...