This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they conta...
Jennifer Foster, Joachim Wagner, Josef van Genabit...
- Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground t...
C. Clausner, Stefan Pletschacher, Apostolos Antona...
A methodology using decoder banks is proposed for high-accuracy, fully automatic recognition of machine printed text across a wide range of challenging image qualities, without re...
In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology cons...