Word Retrieval in Historical Document Using Character-Primitives

14 years 7 months ago

Download www.icdar2011.org

Word searching and indexing in historical document collections is a challenging problem because, characters in these documents are often touching or broken due to degradation/ageing effects. For efﬁcient searching in such historical documents, this paper presents a novel approach towards word spotting using string matching of character primitives. We describe the text string as a sequence of primitives which consists of a single character or a part of a character. Primitive segmentation is performed analyzing text background information that is obtained by water reservoir technique. Next, the primitives are clustered using template matching and a codebook of representative primitives is built. Using this primitive codebook, the text information in the document images are encoded and stored. For a query word, we segment it into primitives and encode the word by a string of representative primitives from codebook. Finally, an approximate string matching is applied to ﬁnd similar wor...

Partha Pratim Roy, Jean-Yves Ramel, Nicolas Ragot

Real-time Traffic