Automatically Generating Labeled Examples for Web Wrapper Maintenance

16 years 2 days ago

Download www.tic.udc.es

In order to let software programs gain full benefit from semi-structured web sources, wrapper programs must be built to provide a “machine-readable” view over them. A significant problem of this approach is that, since web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real-world web data extraction problems.

Juan Raposo, Alberto Pan, Manuel Álvarez, J

Real-time Traffic

Internet Technology | Normal Wrapper Operation | Semi-structured Web Sources | Web Sources | WEBI 2005 |

claim paper

» Automatically Maintaining Wrappers for Web Sources

» Schemaguided wrapper maintenance for webdata extraction

» SiteWide Wrapper Induction for Life Science Deep Web Databases

» Learning the Common Structure of Data

» Ontology Guided Autonomous Label Assignment in Wrapper Induced Tables with Missing Column ...

» Fully automatic wrapper generation for search engines

» A LayoutIndependent Web News Article Contents Extraction Method Based on Relevance Analysi...

» Web data extraction based on partial tree alignment

Post Info
More Details (n/a)

Added	28 Jun 2010
Updated	28 Jun 2010
Type	Conference
Year	2005
Where	WEBI
Authors	Juan Raposo, Alberto Pan, Manuel Álvarez, Justo Hidalgo

Comments (0)

Sciweavers

Automatically Generating Labeled Examples for Web Wrapper Maintenance

Internet Technology | Normal Wrapper Operation | Semi-structured Web Sources | Web Sources | WEBI 2005 |

Explore & Download

Productivity Tools

Sciweavers