Sciweavers

DAS
2006
Springer

Language Identification in Degraded and Distorted Document Images

14 years 4 months ago
Language Identification in Degraded and Distorted Document Images
This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and a word frequency template are firstly constructed based on the proposed word shape coding scheme. Identification is then accomplished based on Bray Curtis or Hamming distance between the word shape code of query images and the constructed word shape and frequency templates. Experiments show the average identification rate upon eight Latin-based languages reaches over 99%. . . .
Shijian Lu, Chew Lim Tan, Weihua Huang
Added 22 Aug 2010
Updated 22 Aug 2010
Type Conference
Year 2006
Where DAS
Authors Shijian Lu, Chew Lim Tan, Weihua Huang
Comments (0)