Sciweavers

WWW
2011
ACM

Growing parallel paths for entity-page discovery

13 years 6 months ago
Growing parallel paths for entity-page discovery
In this paper, we use the structural and relational information on the Web to find entity-pages. Specifically, given a Web site and an entity-page (e.g., department and faculty member homepage) we seek to find all of the entity-pages of the same type (e.g., all faculty members in the department). To do this, we propose a web structure mining method which grows parallel paths through the web graph and DOM trees. We show that by utilizing these parallel paths we can efficiently discover all entity-pages of the same type. Finally, we demonstrate the accuracy of our method with a case study on various domains. Categories and Subject Descriptors H.2.8 [Database Management]: Database applications—data mining; H.3.3 [Information Storage and Retrieval]: Information Search and Retrieval General Terms Algorithms, Experimentation Keywords parallel paths, entity pages, semi-structured data, web structure mining
Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick
Added 15 May 2011
Updated 15 May 2011
Type Journal
Year 2011
Where WWW
Authors Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick Barber, Jiawei Han, Donato Malerba
Comments (0)