Sciweavers

47 search results - page 5 / 10
» Text Degradations and OCR Training
Sort
View
ICDAR
2007
IEEE
13 years 11 months ago
Identification of Latin-Based Languages through Character Stroke Categorization
This paper presents a language identification technique that detects Latin-based languages of imaged documents without OCR. The proposed technique detects languages through the wo...
S. J. Lu, L. Li, Chew Lim Tan
ACL
2008
13 years 9 months ago
Adapting a WSJ-Trained Parser to Grammatically Noisy Text
We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they conta...
Jennifer Foster, Joachim Wagner, Josef van Genabit...
ICDAR
2011
IEEE
12 years 7 months ago
Aletheia - An Advanced Document Layout and Text Ground-Truthing System for Production Environments
- Large-scale digitisation has led to a number of new possibilities with regard to adaptive and learning based methods in the field of Document Image Analysis and OCR. For ground t...
C. Clausner, Stefan Pletschacher, Apostolos Antona...
ICPR
2004
IEEE
14 years 8 months ago
Decoder Banks: Versatility, Automation, and High Accuracy without Supervised Training
A methodology using decoder banks is proposed for high-accuracy, fully automatic recognition of machine printed text across a wide range of challenging image qualities, without re...
Henry S. Baird, Prateek Sarkar
DAS
2008
Springer
13 years 9 months ago
A Complete Optical Character Recognition Methodology for Historical Documents
In this paper a complete OCR methodology for recognizing historical documents, either printed or handwritten without any knowledge of the font, is presented. This methodology cons...
Georgios Vamvakas, Basilios Gatos, Nikolaos Stamat...