Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

178

WISE
2005
Springer

165views Internet Technology» more WISE 2005»

NET - A System for Extracting Web Data from Flat and Nested Data Records

16 years 4 days ago

NET - A System for Extracting Web Data from Flat and Nested Data Records

Download www.cs.uic.edu

This paper studies automatic extraction of structured data from Web pages. Each of such pages may contain several groups of structured data records. Existing automatic methods still have several limitations. In this paper, we propose a more effective method for the task. Given a page, our method first builds a tag tree based on visual information. It then performs a post-order traversal of the tree and matches subtrees in the process using a tree edit distance method and visual cues. After the process ends, data records are found and data items in them are aligned and extracted. The method can extract data from both flat and nested data records. Experimental evaluation shows that the method performs the extraction task accurately.

Bing Liu, Yanhong Zhai

Real-time Traffic

Data Records | Internet Technology | Structured Data | Structured Data Records | WISE 2005 |

claim paper

Related Content

» Extraction of Flat and Nested Data Records from Web Pages

» Automatic User Comment Detection in Flat Internet Fora

» Image Description Mining and Hierarchical Clustering on Data Records Using HRTree

» Deep web data extraction

» FiVaTech PageLevel Web Data Extraction from Template Pages

» ObjectRunner Lightweight Targeted Extraction and Querying of Structured Web Data

» UREST an unsupervised record extraction system

» Viewing WISs as Database Applications

» From HTML documents to web tables and rules

Post Info
More Details (n/a)

Added	25 Jun 2010
Updated	25 Jun 2010
Type	Conference
Year	2005
Where	WISE
Authors	Bing Liu, Yanhong Zhai

Comments (0)