

GoGetIt!: a tool for generating structure-driven web crawlers

15 years 1 months ago
GoGetIt!: a tool for generating structure-driven web crawlers
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a Web site and generates a structure-driven crawler based on navigation patterns, sequences of patterns for the links a crawler has to follow to reach the pages structurally similar to the sample page. In the experiments we have performed, structure-driven crawlers generated by GoGetIt! were able to collect all pages that match the samples given, including those pages added after their generation. Categories and Subject Descriptors H.3.3 [Information Systems]: Information Search and Retrieval--Clustering, Search process General Terms Algorithms, Experimentation Keywords Web Crawlers, Tree Edit Distance, Web Data Extraction
Altigran Soares da Silva, Edleno Silva de Moura, J
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2006
Where WWW
Authors Altigran Soares da Silva, Edleno Silva de Moura, João M. B. Cavalcanti, Márcio L. A. Vidal
Comments (0)