This paper presents an algorithm called CIPDEC (Content Integrity of Printed Documents using Error Correction), which identifies any modifications made to a printed document. CIPD...
In order to overcome poor readability of text and recognizability of image features in low resolution thumbnails, a novel image representation of compound document images - a Smar...
Kathrin Berkner, Edward L. Schwartz, Christophe Ma...
We propose a concise definition of the skew angle of document, based on mathematical morphology. This definition has the advantages to be applicable both for binary and grey-scale...
The Medical Article Records System or MARS has been developed at the U.S. National Library of Medicine (NLM) for automated data entry of bibliographical information from medical j...
Using handwritten characters we address two questions (i) what is the group identification performance of different alphabets (upper and lower case) and (ii) what are the best cha...
Catalin I. Tomai, Devika M. Kshirsagar, Sargur N. ...
Symbolic Indirect Correlation (SIC) is a new classification method for unsegmented patterns. SIC requires two levels of comparisons. First, the feature sequences from an unknown q...
George Nagy, Ashutosh Joshi, Mukkai S. Krishnamoor...
Retrieving documents by subject matter is the general goal of information retrieval and other content access systems. There are other aspects of textual content, however, which fo...
In this paper we present an OCR validation module, implemented for the System for Preservation of Electronic Resources (SPER) developed at the U.S. National Library of Medicine.1 ...
We describe a technique of linguistic post-processing of whole-book recognition results. Whole-book recognition is a technique that improves recognition of book images using fully...
This paper presents a novel approach for the multi-oriented text line extraction from historical handwritten Arabic documents. Because of the multi-orientation of lines and their ...