

Automatically Generating Labeled Examples for Web Wrapper Maintenance

14 years 7 months ago
Automatically Generating Labeled Examples for Web Wrapper Maintenance
In order to let software programs gain full benefit from semi-structured web sources, wrapper programs must be built to provide a “machine-readable” view over them. A significant problem of this approach is that, since web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real-world web data extraction problems.
Juan Raposo, Alberto Pan, Manuel Álvarez, J
Added 28 Jun 2010
Updated 28 Jun 2010
Type Conference
Year 2005
Where WEBI
Authors Juan Raposo, Alberto Pan, Manuel Álvarez, Justo Hidalgo
Comments (0)