Sciweavers

WWW
2011
ACM

HyLiEn: a hybrid approach to general list extraction on the web

13 years 6 months ago
HyLiEn: a hybrid approach to general list extraction on the web
We consider the problem of automatically extracting general lists from the web. Existing approaches are mostly dependent upon either the underlying HTML markup or the visual structure of the Web page. We present HyLiEn an unsupervised, Hybrid approach for automatic List discovery and Extraction on the Web. It employs general assumptions about the visual rendering of lists, and the structural representation of items contained in them. We show that our method significantly outperforms existing methods. Categories and Subject Descriptors H.2.8 [Database Management]: Database applications— data mining; H.3.1 [Content Analysis and Indexing]: [structured data extraction] Keywords Web lists, Web mining, Web information integration
Fabio Fumarola, Tim Weninger, Rick Barber, Donato
Added 15 May 2011
Updated 15 May 2011
Type Journal
Year 2011
Where WWW
Authors Fabio Fumarola, Tim Weninger, Rick Barber, Donato Malerba, Jiawei Han
Comments (0)