

Word Retrieval in Historical Document Using Character-Primitives

13 years 2 months ago
Word Retrieval in Historical Document Using Character-Primitives
Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/ageing effects. For efficient searching in such historical documents, this paper presents a novel approach towards word spotting using string matching of character primitives. We describe the text string as a sequence of primitives which consists of a single character or a part of a character. Primitive segmentation is performed analyzing text background information that is obtained by water reservoir technique. Next, the primitives are clustered using template matching and a codebook of representative primitives is built. Using this primitive codebook, the text information in the document images are encoded and stored. For a query word, we segment it into primitives and encode the word by a string of representative primitives from codebook. Finally, an approximate string matching is applied to find similar wor...
Partha Pratim Roy, Jean-Yves Ramel, Nicolas Ragot
Added 24 Dec 2011
Updated 24 Dec 2011
Type Journal
Year 2011
Authors Partha Pratim Roy, Jean-Yves Ramel, Nicolas Ragot
Comments (0)