Visual information extraction

15 years 6 months ago

Download u.cs.biu.ac.il

Typographic and visual information is an integral part of textual documents. Most information extraction systems ignore most of this visual information, processing the text as a linear sequence of words. Thus, much valuable information is lost. In this paper, we show how to make use of this visual information for information extraction. We present an algorithm that allows to automatically extract specific fields of the document (such as the title, author, etc.), based exclusively on the visual formatting of the document, without any reference to the semantic content. The algorithm employs a machine learning approach, whereby the system is first provided with a set of training documents in which the target fields are manually tagged, and automatically learns how to extract these fields in future documents. We implemented the algorithm in a system for automatic analysis of documents in PDF format. We present experimental results of applying the system on a set of financial documents, ex...

Yonatan Aumann, Ronen Feldman, Yair Liberzon, Biny

Real-time Traffic

Documents | Information | KAIS 2006 | Visual Information |

claim paper

» Recognising Informative Web Page Blocks Using Visual Segmentation for Efficient Informatio...

» DiLiA a Digital Library Assistant A New Approach to Information Discovery through Inform...

» Extracting semantic structure of web documents using content and visual information

» Information Visualization for Knowledge Extraction in Neural Networks

» Information Extraction to Generate Visual Simulations of Car Accidents from Written Descri...

» ViPER augmenting automatic information extraction with visual perceptions

» Visual Web Information Extraction with Lixto

» Fusion of Range and Visual Data for the Extraction of Scene Structure Information

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2006
Where	KAIS
Authors	Yonatan Aumann, Ronen Feldman, Yair Liberzon, Binyamin Rosenfeld, Jonathan Schler

Comments (0)

Sciweavers

Visual information extraction

Documents | Information | KAIS 2006 | Visual Information |

Explore & Download

Productivity Tools

Sciweavers