Hybrid Page Layout Analysis via Tab-Stop Detection

16 years 1 months ago

Download www.cvc.uab.es

A new hybrid page layout analysis algorithm is proposed, which uses bottom-up methods to form an initial data-type hypothesis and locate the tab-stops that were used when the page was formatted. The detected tab-stops, are used to deduce the column layout of the page. The column layout is then applied in a top-down manner to impose structure and reading-order on the detected regions. The complete C++ source code implementation is available as part of the Tesseract open source OCR engine at http://code.google.com/p/tesseract-ocr.

Raymond W. Smith

Real-time Traffic

Column Layout | Document Analysis | Hybrid Page Layout | ICDAR 2009 | Initial Data-type Hypothesis |

claim paper

Post Info
More Details (n/a)

Added	21 May 2010
Updated	21 May 2010
Type	Conference
Year	2009
Where	ICDAR
Authors	Raymond W. Smith

Comments (0)

Sciweavers

Hybrid Page Layout Analysis via Tab-Stop Detection

Column Layout | Document Analysis | Hybrid Page Layout | ICDAR 2009 | Initial Data-type Hypothesis |

Explore & Download

Productivity Tools

Sciweavers