Information extraction from structured documents using k-testable tree automaton inference

15 years 6 months ago

Download alpha.uhasselt.be

Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from structured documents, such as HTML or XML, uses learning techniques that are based on strings, such as finite automata induction. These methods do not exploit the tree structure of the documents. A natural way to do this is to induce tree automata, which are like finite state automata but parse trees instead of strings. In this work, we explore induction of k-testable ranked tree automata from a small set of annotated examples. We describe three variants which differ in the way they generalize the inferred automaton. Experimental results on a set of benchmark data sets show that our approach compares favorably to string-based approaches. However, the quality of the extraction is still suboptimal.

Raymond Kosala, Hendrik Blockeel, Maurice Bruynoog

Real-time Traffic

Automata Induction | DKE 2006 | Documents | Tree Automata |

claim paper

» XTRACT A System for Extracting Document Type Descriptors from XML Documents

» Information extraction by finding repeated structure

» Tuples Extraction from HTML Using Logic Wrappers and Inductive Logic Programming

» Road network extraction from airborne LiDAR data using scene context

» Extracting XML schema from multiple implicit xml documents based on inductive reasoning

» Rule Learning for Feature Values Extraction from HTML Product Information Sheets

» Exploring syntactic structured features over parse trees for relation extraction using ker...

» Integrating web directories by learning their structures

Post Info
More Details (n/a)

Added	11 Dec 2010
Updated	11 Dec 2010
Type	Journal
Year	2006
Where	DKE
Authors	Raymond Kosala, Hendrik Blockeel, Maurice Bruynooghe, Jan Van den Bussche

Comments (0)

Sciweavers

Information extraction from structured documents using k-testable tree automaton inference

Automata Induction | DKE 2006 | Documents | Tree Automata |

Explore & Download

Productivity Tools

Sciweavers