Sciweavers

IJCAI
2003

Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference

14 years 1 months ago
Information Extraction from Web Documents Based on Local Unranked Tree Automaton Inference
Information extraction (IE) aims at extracting specific information from a collection of documents. A lot of previous work on 10 from semi-structured documents (in XML or HTML) uses learning techniques based on strings. Some recent work converts the document to a ranked tree and uses tree automaton induction. This paper introduces an algorithm that uses unranked trees to induce an automaton. Experiments show that this gives the best results obtained so far for IE from semi-structured documents based on learning.
Raymond Kosala, Maurice Bruynooghe, Jan Van den Bu
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where IJCAI
Authors Raymond Kosala, Maurice Bruynooghe, Jan Van den Bussche, Hendrik Blockeel
Comments (0)