Sciweavers

WISE
2005
Springer

NET - A System for Extracting Web Data from Flat and Nested Data Records

14 years 5 months ago
NET - A System for Extracting Web Data from Flat and Nested Data Records
This paper studies automatic extraction of structured data from Web pages. Each of such pages may contain several groups of structured data records. Existing automatic methods still have several limitations. In this paper, we propose a more effective method for the task. Given a page, our method first builds a tag tree based on visual information. It then performs a post-order traversal of the tree and matches subtrees in the process using a tree edit distance method and visual cues. After the process ends, data records are found and data items in them are aligned and extracted. The method can extract data from both flat and nested data records. Experimental evaluation shows that the method performs the extraction task accurately.
Bing Liu, Yanhong Zhai
Added 25 Jun 2010
Updated 25 Jun 2010
Type Conference
Year 2005
Where WISE
Authors Bing Liu, Yanhong Zhai
Comments (0)