Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
We present a browser-extending Semantic Web extraction system that maps HTML documents to tables and, where possible, to rules. First, the basic data extractor ViPER distills and ...
This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The...
Information extraction (IE) addresses the problem of extracting specific information from a collection of documents. Much of the previous work on IE from structured documents, suc...
Raymond Kosala, Hendrik Blockeel, Maurice Bruynoog...