Automatically Maintaining Wrappers for Web Sources

16 years 6 days ago

Download www.tic.udc.es

A substantial subset of the web data follows some kind of underlying structure. Nevertheless, HTML does not contain any schema or semantic information about the data it represents. A program able to provide software applications with a structured view of those semi-structured web sources is usually called a wrapper. Wrappers are able to accept a query against the source and return a set of structured results, thus enabling applications to access web data in a similar manner to that of information from databases. A significant problem in this approach arises because web sources may experiment changes that invalidate the current wrappers. In this paper, we present novel heuristics and algorithms to address this problem. Our approach is based on collecting some query results during wrapper operation. Then, when the source changes, they are used to generate a set of labeled examples that are then provided as input to a wrapper induction algorithm able to regenerate the wrapper. We have te...

Juan Raposo, Alberto Pan, Manuel Álvarez, J

Real-time Traffic

Database | IDEAS 2005 | Semi-structured Web Sources | Web Data | Web Sources |

claim paper

» Automatically Generating Labeled Examples for Web Wrapper Maintenance

» Automatic wrapper maintenance for semistructured web sources using results from previous q...

» SemiAutomatic Wrapper Generation for Internet Information Sources

» ITPilot A Toolkit for IndustrialStrength Web Data Extraction

» Learning the Common Structure of Data

» Schemaguided wrapper maintenance for webdata extraction

» Wrapper Generation for Web Accessible Data Sources

» Ontology Guided Autonomous Label Assignment in Wrapper Induced Tables with Missing Column ...

Post Info
More Details (n/a)

Added	25 Jun 2010
Updated	25 Jun 2010
Type	Conference
Year	2005
Where	IDEAS
Authors	Juan Raposo, Alberto Pan, Manuel Álvarez, Justo Hidalgo

Comments (0)

Sciweavers

Automatically Maintaining Wrappers for Web Sources

Database | IDEAS 2005 | Semi-structured Web Sources | Web Data | Web Sources |

Explore & Download

Productivity Tools

Sciweavers