Sciweavers

AAAI
2006

Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model

14 years 1 months ago
Table Extraction Using Spatial Reasoning on the CSS2 Visual Box Model
Tables on web pages contain a huge amount of semantically explicit information, which makes them a worthwhile target for automatic information extraction and knowledge acquisition from the Web. However, the task of table extraction from web pages is difficult, because of HTML's design purpose to convey visual instead of semantic information. In this paper, we propose a robust technique for table extraction from arbitrary web pages. This technique relies upon the positional information of visualized DOM element nodes in a browser and, hereby, separates the intricacies of code implementation from the actual intended visual appearance. The novel aspect of the proposed web table extraction technique is the effective use of spatial reasoning on the CSS2 visual box model, which shows a high level of robustness even without any form of learning (F-measure 90%). We describe the ideas behind our approach, the tabular pattern recognition algorithm operating on a double topographical grid s...
Wolfgang Gatterbauer, Paul Bohunsky
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where AAAI
Authors Wolfgang Gatterbauer, Paul Bohunsky
Comments (0)