Sciweavers

DAS
2010
Springer

Text extraction from graphical document images using sparse representation

14 years 3 months ago
Text extraction from graphical document images using sparse representation
A novel text extraction method from graphical document images is presented in this paper. Graphical document images containing text and graphics components are considered as two-dimensional signals by which text and graphics have different morphological characteristics. The proposed algorithm relies upon a sparse representation framework with two appropriately chosen discriminative overcomplete dictionaries, each one gives sparse representation over one type of signal and non-sparse representation over the other. Separation of text and graphics components is obtained by promoting sparse representation of input images in these two dictionaries. Some heuristic rules are used for grouping text components into text strings in post-processing steps. The proposed method overcomes the problem of touching between text and graphics. Preliminary experiments show some promising results on different types of document. Categories and Subject Descriptors I.4.6 [Image Processing and Computer Visio...
Thai V. Hoang, Salvatore Tabbone
Added 24 Aug 2010
Updated 24 Aug 2010
Type Conference
Year 2010
Where DAS
Authors Thai V. Hoang, Salvatore Tabbone
Comments (0)