

Visually guided bottom-up table detection and segmentation in web documents

15 years 3 months ago
Visually guided bottom-up table detection and segmentation in web documents
In the AllRight project, we are developing an algorithm for unsupervised table detection and segmentation that uses the visual rendition of a Web page rather than the HTML code. Our algorithm works bottom-up by grouping word bounding boxes into larger groups and uses a set of heuristics. It has already been implemented and a preliminary evaluation on about 6000 Web documents has been carried out. Categories and Subject Descriptors: H.3.4 [Information Storage and Retrieval]: Systems and Software; I.7.5 [Document and Text Processing]: Document Capture General Terms: Algorithms, Experimentation.
Bernhard Krüpl, Marcus Herzog
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2006
Where WWW
Authors Bernhard Krüpl, Marcus Herzog
Comments (0)