Sciweavers

ADC
2006
Springer

A two-phase rule generation and optimization approach for wrapper generation

14 years 6 months ago
A two-phase rule generation and optimization approach for wrapper generation
Web information extraction is a fundamental issue for web information management and integrations. A common approach is to use wrappers to extract data from web pages or documents. However, a critical issue for wrapper development is how to generate extraction rules. In this paper, we propose a novel two-phase rule generation and optimization (2P-RULE) approach for wrapper generation. 2P-RULE consists of internal rule optimization (IRO) process and external rule optimization (ERO) process. In IRO, a user, through a GUI interface, firstly creates a mapping from useful values in web page to a schema specified by the users according to target web information. Based on the mapping, the system automatically generates a rule list for the schema. Whereas in ERO, the user can create multiple mappings to generate further rule lists. All the acquired rule lists are merged and refined into one optimized rule list, which is expressed with XQuery as the final extraction rules. Experiments show tha...
Yanan Hao, Yanchun Zhang
Added 13 Jun 2010
Updated 13 Jun 2010
Type Conference
Year 2006
Where ADC
Authors Yanan Hao, Yanchun Zhang
Comments (0)