Sciweavers

178 search results - page 21 / 36
» Scheduling Algorithms for Web Crawling
Sort
View
CIKM
2009
Springer
14 years 2 months ago
Graph-based seed selection for web-scale crawlers
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
Shuyi Zheng, Pavel Dmitriev, C. Lee Giles
SIGIR
2005
ACM
14 years 1 months ago
Server selection methods in hybrid portal search
The TREC .GOV collection makes a valuable web testbed for distributed information retrieval methods because it is naturally partitioned and includes 725 web-oriented queries with ...
David Hawking, Paul Thomas
INFOCOM
2002
IEEE
14 years 15 days ago
Session-Based Overload Control in QoS-Aware Web Servers
—With the explosive use of Internet, contemporary web servers are susceptible to overloads and their services deteriorate drastically and often cause denial of services. In this ...
Huamin Chen, Prasant Mohapatra
WWW
2001
ACM
14 years 8 months ago
Effective Web data extraction with standard XML technologies
We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...
Jussi Myllymaki
WWW
2005
ACM
14 years 8 months ago
Three-level caching for efficient query processing in large Web search engines
Large web search engines have to answer thousands of queries per second with interactive response times. Due to the sizes of the data sets involved, often in the range of multiple...
Xiaohui Long, Torsten Suel