Fast Lexicon-Based Word Recognition in Noisy Index Card Images

14 years 5 months ago

Download algoval.essex.ac.uk

This paper describes a complete system for reading typewritten lexicon words in noisy images - in this case museum index cards. The system is conceptually simple, and straightforward to implement. It involves three stages of processing. The ﬁrst stage extracts row-regions from the image, where each row is a hypothesized line of text. The next stage scans an OCR classiﬁer over each row image, creating a character hypothesis graph in the process. This graph is then searched using a priority-queue based algorithm for the best matches with a set of words (lexicon). Performance evaluation on a set of museum archive cards indicates competitive accuracy and also reasonable throughput. The priority queue algorithm is over two hundred times faster than using ﬂat dynamic programming on these graphs.

Simon M. Lucas, Gregory Patoulas, Andy C. Downton

Real-time Traffic

Document Analysis | ICDAR 2003 | Museum Index Cards | Typewritten Lexicon Words | ﬁrst Stage Extracts |

claim paper

Post Info
More Details (n/a)

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDAR
Authors	Simon M. Lucas, Gregory Patoulas, Andy C. Downton

Comments (0)

Sciweavers

Fast Lexicon-Based Word Recognition in Noisy Index Card Images

Document Analysis | ICDAR 2003 | Museum Index Cards | Typewritten Lexicon Words | ﬁrst Stage Extracts |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers