Sciweavers

VLDB
2004
ACM

An Automatic Data Grabber for Large Web Sites

14 years 4 months ago
An Automatic Data Grabber for Large Web Sites
We demonstrate a system to automatically grab data from data intensive web sites. The system first infers a model that describes at the intensional level the web site as a collection of classes; each class represents a set of structurally homogeneous pages, and it is associated with a small set of representative pages. Based on the model a library of wrappers, one per class, is then inferred, with the help an external wrapper generator. The model, together with the library of wrappers, can thus be used to navigate the site and extract the data.
Valter Crescenzi, Giansalvatore Mecca, Paolo Meria
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where VLDB
Authors Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo, Paolo Missier
Comments (0)