Collaborative Wrapping: A Turbo Framework for Web Data Extraction

16 years 3 months ago

Download www-forward.cs.uiuc.edu

To access data sources on the Web, a crucial step is wrapping, which translates query responses, rendered in textual HTML, back into their relational form. Traditionally, this problem has been addressed with syntax-based approaches for a single source. However, as online databases mutiply, we often need to wrap multipe sources, in particular for domain-based integration. Observing that sources in the same domain usually share common fields, we propose a novel wrapping concept? collaborative wrapping? where multiple sources are extracted concurrently with contentbased synchronization to produce consentaneous extractions. Toward this concept, recognizing wrapping as a communication process, we develop the turbo wraper, upon the insight of turbo codes? a multi-code decoding scheme in information theory. Our experiment shows that the turbo wrapper consistently outperforms baseline single-source methods, is robust, and does benefit from extended scales of source collaboration.

Shui-Lung Chuang, Kevin Chen-Chuan Chang, ChengXia

Real-time Traffic

Baseline Single-source Methods | Collaborative Wrapping | Database | ICDE 2007 | Turbo Wrapper |

claim paper

Added	01 Nov 2009
Updated	01 Nov 2009
Type	Conference
Year	2007
Where	ICDE
Authors	Shui-Lung Chuang, Kevin Chen-Chuan Chang, ChengXiang Zhai

Sciweavers

Collaborative Wrapping: A Turbo Framework for Web Data Extraction

Baseline Single-source Methods | Collaborative Wrapping | Database | ICDE 2007 | Turbo Wrapper |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers