Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

133

Voted

ICPR
2000
IEEE

110views computer vision» more ICPR 2000»

OCR with No Shape Training

16 years 8 months ago

OCR with No Shape Training

Download www.ecse.rpi.edu

We present a document-specific OCR system and apply it to a corpus of faxed business letters. Unsupervised classification of the segmented character bitmaps on each page, using a "clump" metric, typically yields several hundred clusters with highly skewed populations. Letter identities are assigned to each cluster by maximizing matches with a lexicon of English words. We found that for 2/3 of the pages, we can identify almost 80% of the words included in the lexicon, without any shape training. Residual errors are caused by mis-segmentation including missed lines and punctuation. This research differs from earlier attempts to apply cipher decoding to OCR in (1) using real data (2) a more appropriate clustering algorithm, and (3) decoding a many-to-many instead of a one-to-one mapping between clusters and letters.

Tin Kam Ho, George Nagy

Real-time Traffic

Appropriate Clustering Algorithm | Computer Vision | Faxed Business Letters | ICPR 2000 | Segmented Character Bitmaps |

claim paper

Related Content

» Document Style Census for OCR

» A Gamebased Approach to Transcribing Images of Text

» Japanese OCR Error Correction using Character Shape Similarity and Statistical Language Mo...

» Text Degradations and OCR Training

» Shape Encoded Post Processing of Gurmukhi OCR

» A Complete Optical Character Recognition Methodology for Historical Documents

» Keyword Spotting in Document Images through Word Shape Coding

» An Open Source Tesseract Based Optical Character Recognizer for Bangla Script

» Improving OCR Accuracy for Classical Critical Editions

Post Info
More Details (n/a)

Added	09 Nov 2009
Updated	09 Nov 2009
Type	Conference
Year	2000
Where	ICPR
Authors	Tin Kam Ho, George Nagy

Comments (0)