A Handwritten Character Extraction Algorithm for Multi-language Document Image

14 years 6 months ago

Download www.icdar2011.org

—In this paper, we propose a novel method for extracting handwritten characters from multi-language document images, which may contain various types of characters, e.g. Chinese, English, Japanese or their mixture. Firstly, text patches in document image are segmented based on connected component analysis. Rules for merging connected components are chosen according to the results of language identification. Then features are extracted for each basic analysis unit-text patch. Genetic algorithm is applied for feature fusion and patch type classification. Finally, a Markov Random Field model is utilized as a post-processing step to further correct the misclassification of text patch type by considering the document context. Experimental results show that the proposed algorithm can apparently improve the performance of handwritten character extraction. Keywords-handwritten character extraction; multi-language; document segmentation; feature fusion; Markov random field

Yonghong Song, Guilin Xiao, Yuanlin Zhang, Lei Yan

Real-time Traffic

Document Analysis | Document Segmentation | ICDAR 2011 | Markov Random Field | Random Field Model |

claim paper

» Character Extraction from Interfering Background Analysis of DoubleSided Handwritten Arch...

» A Baseline Dependent Approach for Persian Handwritten Character Segmentation

» Analysis of Handwriting Individuality Using Word Features

» Matching of DoubleSided Document Images to Remove Interference

» Text Extraction from Gray Scale Historical Document Images Using Adaptive Local Connectivi...

» ScriptIndependent Handwritten Textlines Segmentation Using Active Contours

» Feature Extraction by Hierarchical Overlapped Elastic Meshing for Handwritten Chinese Char...

» Hidden Markov Random Field Based Approach for OffLine Handwritten Chinese Character Recogn...

Post Info
More Details (n/a)

Added	24 Dec 2011
Updated	24 Dec 2011
Type	Journal
Year	2011
Where	ICDAR
Authors	Yonghong Song, Guilin Xiao, Yuanlin Zhang, Lei Yang, Liuliu Zhao

Comments (0)

Sciweavers

A Handwritten Character Extraction Algorithm for Multi-language Document Image

Document Analysis | Document Segmentation | ICDAR 2011 | Markov Random Field | Random Field Model |

Explore & Download

Productivity Tools

Sciweavers