Sciweavers

563 search results - page 7 / 113
» Crawling the web for structured documents
Sort
View
EDBTW
2010
Springer
13 years 7 months ago
Using visual pages analysis for optimizing web archiving
Due to the growing importance of the World Wide Web, archiving it has become crucial for preserving useful source of information. To maintain a web archive up-to-date, crawlers ha...
Myriam Ben Saad, Stéphane Gançarski
PVLDB
2008
124views more  PVLDB 2008»
13 years 8 months ago
Google's Deep Web crawl
The Deep Web, i.e., content hidden behind HTML forms, has long been acknowledged as a significant gap in search engine coverage. Since it represents a large portion of the structu...
Jayant Madhavan, David Ko, Lucja Kot, Vignesh Gana...
NIPS
2000
13 years 10 months ago
The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...
David A. Cohn, Thomas Hofmann
SIGIR
2008
ACM
13 years 8 months ago
Exploring traversal strategy for web forum crawling
In this paper, we study the problem of Web forum crawling. Web forum has now become an important data source of many Web applications; while forum crawling is still a challenging ...
Yida Wang, Jiang-Ming Yang, Wei Lai, Rui Cai, Lei ...
DASFAA
2007
IEEE
181views Database» more  DASFAA 2007»
14 years 2 months ago
Graph Structure of the Korea Web
The study of the Web graph not only yields valuable insight into Web algorithms for crawling, searching and community discovery, and the sociological phenomena that characterize it...
In Kyu Han, Sang Ho Lee, Soowon Lee