Search Sciweavers | Sciweavers

222

Voted

ICDE
2002
IEEE

161views Database» more ICDE 2002»

Design and Implementation of a High-Performance Distributed Web Crawler

16 years 3 months ago

Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may...

Vladislav Shkapenyuk, Torsten Suel

claim paper

Read More »

114

Voted

WWW
2007
ACM

98views Internet Technology» more WWW 2007»

A large-scale study of robots.txt

16 years 3 months ago

Download www2007.org

Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...

Yang Sun, Ziming Zhuang, C. Lee Giles

claim paper

Read More »

131

click to vote

ICCS
2007
Springer

112views Applied Computing» more ICCS 2007»

Estimating the Change of Web Pages

15 years 6 months ago

Download dblab.ssu.ac.kr

This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified, respectively, in the future crawls. The methods can ...

Sung Jin Kim, Sang Ho Lee

claim paper

Read More »

136

Voted

MAICS
2004

219views Artificial Intelligence» more MAICS 2004»

Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research

15 years 3 months ago

Download cs.roosevelt.edu

This paper describes the current state of RUgle, a system for classifying and indexing papers made available on the World Wide Web, in a domain-independent and universal manner. B...

Eric G. Berkowitz, Mohamed Reda Elkhadiri

claim paper

Read More »

134

Voted

NIPS
2000

155views Information Technology» more NIPS 2000»

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

15 years 3 months ago

Download www.cs.cmu.edu

We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...

David A. Cohn, Thomas Hofmann

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers