Sciweavers

RULEML
2004
Springer

Rule Learning for Feature Values Extraction from HTML Product Information Sheets

14 years 4 months ago
Rule Learning for Feature Values Extraction from HTML Product Information Sheets
The Web is now a huge information repository with a rich semantic structure that, however, is primarily addressed to human understanding rather than automated processing by a computer. The problem of collecting product information from the Web and organizing it in an appropriate way for automated machine processing is a primary task of software shopping agents and has received a lot of attention during the last years. In this paper we assume that product information is represented as a set of feature-value pairs contained in an HTML product information sheet that is usually formatted using HTML tables. The paper presents a technique for learning extraction rules of product information from such product information sheets. The technique exploits the fact that the Web pages that represent product information of a certain producer are generated on the fly from the producer database and therefore they exhibit uniform structures. Consequently, while the extraction task is executed manually...
Costin Badica, Amelia Badica
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where RULEML
Authors Costin Badica, Amelia Badica
Comments (0)