Sciweavers

ECCV
2008
Springer

Learning Visual Shape Lexicon for Document Image Content Recognition

15 years 2 months ago
Learning Visual Shape Lexicon for Document Image Content Recognition
Developing effective content recognition methods for diverse imagery continues to challenge computer vision researchers. We present a new approach for document image content categorization using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant shape feature that is generic enough to be detected repeatably and segmentation free. We learn a concise, structurally indexed shape lexicon from training by clustering and partitioning feature types through graph cuts. We demonstrate our approach on two challenging document image content recognition problems: 1) The classification of 4, 500 Web images crawled from Google Image Search into three content categories -- pure image, image with text, and document image, and 2) Language identification of 8 languages (Arabic, Chinese, English, Hindi, Japanese, Korean, Russian, and Thai) on a 1, 512 complex document image database composed of mixed machine printed text and handwriting. Our approach is capable t...
Guangyu Zhu, Xiaodong Yu, Yi Li, David S. Doermann
Added 15 Oct 2009
Updated 15 Oct 2009
Type Conference
Year 2008
Where ECCV
Authors Guangyu Zhu, Xiaodong Yu, Yi Li, David S. Doermann
Comments (0)