Keyword Spotting in Document Images through Word Shape Coding

16 years 23 days ago

Download www.comp.nus.edu.sg

With large databases of document images available, a method for users to find keywords in documents will be useful. One approach is to perform Optical Character Recognition (OCR) on each document followed by indexing of the resulting text. However, if the quality of the document is poor or time is critical, complete OCR of all images is infeasible. This paper build upon previous works on Word Shape Coding to propose an alternative technique and combination of feature descriptors for keyword spotting without the use of OCR. Different sequence alignment similarity measures can be used for partial or whole word matching. The proposed technique is tolerant to serifs, font styles and certain degrees of touching, broken or overlapping characters. It improves over previous works with not only better precision and lower collision rate, but more importantly, the ability for partial matching. Experiment results show that it is about 15 times faster than OCR. It is a promising technique to boost...

Shuyong Bai, Linlin Li, Chew Lim Tan

Real-time Traffic

Complete Ocr | Document | Document Analysis | Document Image | ICDAR 2009 |

claim paper

» Language Identification in Degraded and Distorted Document Images

» Identification of LatinBased Languages through Character Stroke Categorization

» Fast KeyWord Searching via Embedding and ActiveDTW

» Script and Language Identification in Degraded and Distorted Document Images

Post Info
More Details (n/a)

Added	21 May 2010
Updated	21 May 2010
Type	Conference
Year	2009
Where	ICDAR
Authors	Shuyong Bai, Linlin Li, Chew Lim Tan

Comments (0)

Sciweavers

Keyword Spotting in Document Images through Word Shape Coding

Complete Ocr | Document | Document Analysis | Document Image | ICDAR 2009 |

Explore & Download

Productivity Tools

Sciweavers