Sciweavers

ICDAR
2003
IEEE

Automatic Feature Selection with Applications to Script Identification of Degraded Documents

14 years 4 months ago
Automatic Feature Selection with Applications to Script Identification of Degraded Documents
Current approaches to script identification rely on hand-selected features and often require processing a significant part of the document to achieve reliable identification. We present an approach that applies a large pool of image features to a small training sample and uses subset feature selection techniques to automatically select a subset with the most discriminating power. At run time we use a classifier coupled with an evidence accumulation engine to report a script label once a preset likelihood threshold has been reached. We apply the system to a diverse corpus of printed Russian and English documents that suffer from common degradation problems. Our validation study shows promising results both in terms of the script identification accuracy and the ability to identify script on the scale of individual words and text lines.
Vitaly Ablavsky, Mark R. Stevens
Added 04 Jul 2010
Updated 04 Jul 2010
Type Conference
Year 2003
Where ICDAR
Authors Vitaly Ablavsky, Mark R. Stevens
Comments (0)