A Fast Preprocessing Method for Table Boundary Detection: Narrowing Down the Sparse Lines Using Solely Coordinate Information

15 years 8 months ago

Download chemxseer.ist.psu.edu

As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classification and retrieval. Tables, as a specific document component, are ubiquitous everywhere. Accurately detecting the table boundary plays a crucial role for the later table structure decomposition and table data collection. In this paper, we propose an easy but effective table boundary detection method. Our method has two unique advantages comparing with other works in this field: 1) Because most tables are text-based, we claim that the text object of PDF itself is good enough for table detection. In addition, we believe that the font information is not so reliable as other works stated. 2) Based on the nature of the table cells, we notice the sparse-line property of table rows. By filtering out the non-sparse lines initially, the table boundary detection problem can be simplified into the sparse line analysis p...

Ying Liu, Prasenjit Mitra, C. Lee Giles

Real-time Traffic

DAS 2008 | Document Analysis | Specific Document Component | Table Boundary | Table Boundary Detection |

claim paper

Post Info
More Details (n/a)

Added	19 Oct 2010
Updated	19 Oct 2010
Type	Conference
Year	2008
Where	DAS
Authors	Ying Liu, Prasenjit Mitra, C. Lee Giles

Comments (0)

Sciweavers

A Fast Preprocessing Method for Table Boundary Detection: Narrowing Down the Sparse Lines Using Solely Coordinate Information

DAS 2008 | Document Analysis | Specific Document Component | Table Boundary | Table Boundary Detection |

Explore & Download

Productivity Tools

Sciweavers