A Model for Detecting and Merging Vertically Spanned Table Cells in Plain Text Documents

16 years 7 days ago

Download web.science.mq.edu.au

A spanned cell in a table is a single, complete unit that physically occupies multiple columns and/or multiple rows. Spanned cells are common in tables, and they are a signiﬁcant cause of error in the extraction of tables from free text documents. In this paper, we present a model for the detection and merging of vertically spanned cells for tables presented in plain text documents. Our model and algorithm are based purely on the layout features of the tables, and they require no semantic understanding of the documents. When tested on the 98 tables appearing in 40 randomly selected documents from a corpus of company announcements from the Australian Stock Exchange (ASX), our algorithm achieves an accuracy of 86.79% in detecting and merging vertically spanned cells.

Vanessa Long, Robert Dale, Steve Cassidy

Real-time Traffic

Document Analysis | Free Text Documents | ICDAR 2005 | Plain Text Documents | Text Documents |

claim paper

Post Info
More Details (n/a)

Added	24 Jun 2010
Updated	24 Jun 2010
Type	Conference
Year	2005
Where	ICDAR
Authors	Vanessa Long, Robert Dale, Steve Cassidy

Comments (0)

Sciweavers

A Model for Detecting and Merging Vertically Spanned Table Cells in Plain Text Documents

Document Analysis | Free Text Documents | ICDAR 2005 | Plain Text Documents | Text Documents |

Explore & Download

Productivity Tools

Sciweavers