Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

191

VLDB
2001
ACM

144views Database» more VLDB 2001»

RoadRunner: Towards Automatic Data Extraction from Large Web Sites

15 years 11 months ago

RoadRunner: Towards Automatic Data Extraction from Large Web Sites

Download www.vldb.org

The paper investigates techniques for extracting data from HTML sites through the use of automatically generated wrappers. To automate the wrapper generation and the data extraction process, the paper develops a novel technique to compare HTML pages and generate a wrapper based on their similarities and diﬀerences. Experimental results on real-life data-intensive Web sites conﬁrm the feasibility of the approach.

Valter Crescenzi, Giansalvatore Mecca, Paolo Meria

Real-time Traffic

Data Extraction Process | Paper Investigates Techniques | Real-life Data-intensive Web | VLDB 2001 |

claim paper

Related Content

» Fixing Weakly Annotated Web Data Using Relational Models

» An Automatic Data Grabber for Large Web Sites

» Automatic Data Extraction from DataRich Web Pages

» Pollock automatic generation of virtual web services from web sites

» Extracting Structured Data from Web Pages

» RedundancyDriven Web Data Extraction and Integration

» Web Data Extraction for Business Intelligence The Lixto Approach

» Integrating Information to Bootstrap Information Extraction from Web Sites

» Extending an online information site with accurate domaindependent extracts from the World...

Post Info
More Details (n/a)

Added	30 Jul 2010
Updated	30 Jul 2010
Type	Conference
Year	2001
Where	VLDB
Authors	Valter Crescenzi, Giansalvatore Mecca, Paolo Merialdo

Comments (0)