Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

152

DAS
2006
Springer

119views Document Analysis» more DAS 2006»

Language Identification in Degraded and Distorted Document Images

15 years 10 months ago

Language Identification in Degraded and Distorted Document Images

Download www.comp.nus.edu.sg

This paper presents a language identification technique that differentiates Latin-based languages in degraded and distorted document images. Different from the reported methods that transform word images through a character shape coding process, our method directly captures word shapes with the local extremum points and the horizontal intersection numbers, which are both tolerant of noise, character segmentation errors, and slight skew distortions. For each language studied, a word shape template and a word frequency template are firstly constructed based on the proposed word shape coding scheme. Identification is then accomplished based on Bray Curtis or Hamming distance between the word shape code of query images and the constructed word shape and frequency templates. Experiments show the average identification rate upon eight Latin-based languages reaches over 99%. . . .

Shijian Lu, Chew Lim Tan, Weihua Huang

Real-time Traffic

DAS 2006 | Document Analysis | Shape Coding | Word Shape | Word Shape Template |

claim paper

Related Content

» Script and Language Identification in Degraded and Distorted Document Images

» Identification of scripts and orientations of degraded document images

» Identification of LatinBased Languages through Character Stroke Categorization

» Automatic Feature Selection with Applications to Script Identification of Degraded Documen...

» Script identification of camerabased images

» Geometric distortion signatures for printer identification

» Language Identification of Character Images Using Machine Learning Techniques

» Restoration of images scanned from thick bound documents

» Retrieval of machineprinted Latin documents through Word Shape Coding

Post Info
More Details (n/a)

Added	22 Aug 2010
Updated	22 Aug 2010
Type	Conference
Year	2006
Where	DAS
Authors	Shijian Lu, Chew Lim Tan, Weihua Huang

Comments (0)