Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model

15 years 8 months ago

Download www.aaai.org

Tables on web pages contain a huge amount of semantically explicit information, which makes them a worthwhile target for automatic information extraction and knowledge acquisition from the Web. However, the task of table extraction from web pages is difficult, because of HTML's design purpose to convey visual instead of semantic information. In this paper, we propose a robust technique for table extraction from arbitrary web pages. This technique relies upon the positional information of visualized DOM element nodes in a browser and, hereby, separates the intricacies of code implementation from the actual intended visual appearance. The novel aspect of the proposed web table extraction technique is the effective use of spatial reasoning on the CSS2 visual box model, which shows a high level of robustness even without any form of learning (F-measure 90%). We describe the ideas behind our approach, the tabular pattern recognition algorithm operating on a double topographical grid s...

Wolfgang Gatterbauer, Paul Bohunsky

Real-time Traffic

AAAI 2006 | Intelligent Agents | Table Extraction | Web Pages | Web Table Extraction |

claim paper

» Towards domainindependent information extraction from web tables

» From Images to Bodies Modelling and Exploiting Spatial Occlusion and Motion Parallax

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	AAAI
Authors	Wolfgang Gatterbauer, Paul Bohunsky

Comments (0)

Sciweavers

Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model

AAAI 2006 | Intelligent Agents | Table Extraction | Web Pages | Web Table Extraction |

Explore & Download

Productivity Tools

Sciweavers