This paper presents a pair of identification technique that automatically detect scripts and orientations of document images suffering from various types of document degradation. ...
Patent document images maintained by the U.S. patent database have a specific format, in which figures and text descriptions are separated into different sections. This makes it...
We argue that the advent of large volumes of full-length text, as opposed to short texts tracts and newswire, should be accompanied by corresponding new approaches to information ...
Documents can be assigned keywords by frequency analysis of the terms found in the document text, which arguably is the primary source of knowledge about the document itself. By in...
Anette Hulth, Jussi Karlgren, Anna Jonsson, Henrik...
Text extraction is an important phase in document recognition systems. In order to segment text from a page document it is necessary to detect all the possible manuscript text reg...
Rodolfo P. dos Santos, Gabriela S. Clemente, Ing R...