An Automatic Data Grabber for Large Web Sites

15 years 12 months ago

Download www.vldb.org

We demonstrate a system to automatically grab data from data intensive web sites. The system ﬁrst infers a model that describes at the intensional level the web site as a collection of classes; each class represents a set of structurally homogeneous pages, and it is associated with a small set of representative pages. Based on the model a library of wrappers, one per class, is then inferred, with the help an external wrapper generator. The model, together with the library of wrappers, can thus be used to navigate the site and extract the data.

Valter Crescenzi, Giansalvatore Mecca, Paolo Meria

Real-time Traffic

Data Intensive Web | Database | External Wrapper Generator | VLDB 2004 | Web Site |

claim paper

» DataDriven OneToOne Web Site Generation for DataIntensive Applications

» Automatic geotagging of Russian web sites

» Automatic Data Extraction from DataRich Web Pages

» Web site mining a new way to spot competitors customers and suppliers in the world wide we...

» Web Canary A Virtualized Web Browser to Support LargeScale Silent Collaboration in Detecti...

» Pollock automatic generation of virtual web services from web sites

» Gaining Insights into Web Customers using Web Intelligence

» Adaptive Site Map Visualization Based on Landmarks

Post Info
More Details (n/a)

Added	02 Jul 2010
Updated	02 Jul 2010
Type	Conference
Year	2004
Where	VLDB
Authors	Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo, Paolo Missier

Comments (0)

Sciweavers

An Automatic Data Grabber for Large Web Sites

Data Intensive Web | Database | External Wrapper Generator | VLDB 2004 | Web Site |

Explore & Download

Productivity Tools

Sciweavers