Automatic Feature Selection with Applications to Script Identification of Degraded Documents

16 years 2 days ago

Download www.cse.salford.ac.uk

Current approaches to script identification rely on hand-selected features and often require processing a significant part of the document to achieve reliable identification. We present an approach that applies a large pool of image features to a small training sample and uses subset feature selection techniques to automatically select a subset with the most discriminating power. At run time we use a classifier coupled with an evidence accumulation engine to report a script label once a preset likelihood threshold has been reached. We apply the system to a diverse corpus of printed Russian and English documents that suffer from common degradation problems. Our validation study shows promising results both in terms of the script identification accuracy and the ability to identify script on the scale of individual words and text lines.

Vitaly Ablavsky, Mark R. Stevens

Real-time Traffic

Document Analysis | ICDAR 2003 | Preset Likelihood Threshold | Script Identification | Script Identification Accuracy |

claim paper

Added	04 Jul 2010
Updated	04 Jul 2010
Type	Conference
Year	2003
Where	ICDAR
Authors	Vitaly Ablavsky, Mark R. Stevens

Sciweavers

Automatic Feature Selection with Applications to Script Identification of Degraded Documents

Document Analysis | ICDAR 2003 | Preset Likelihood Threshold | Script Identification | Script Identification Accuracy |

Explore & Download

Productivity Tools

Sciweavers