Search Sciweavers | Sciweavers

299 search results - page 27 / 60

» User-centric Web crawling

319

click to vote

ICDE
2002
IEEE

161views Database» more ICDE 2002»

Design and Implementation of a High-Performance Distributed Web Crawler

16 years 8 months ago

Download cis.poly.edu

Broad web search engines as well as many more specialized search tools rely on web crawlers to acquire large collections of pages for indexing and analysis. Such a web crawler may...

Vladislav Shkapenyuk, Torsten Suel

claim paper

Read More »

176

click to vote

WWW
2007
ACM

98views Internet Technology» more WWW 2007»

A large-scale study of robots.txt

16 years 8 months ago

Download www2007.org

Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...

Yang Sun, Ziming Zhuang, C. Lee Giles

claim paper

Read More »

204

click to vote

ICCS
2007
Springer

112views Applied Computing» more ICCS 2007»

Estimating the Change of Web Pages

15 years 11 months ago

Download dblab.ssu.ac.kr

This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified, respectively, in the future crawls. The methods can ...

Sung Jin Kim, Sang Ho Lee

claim paper

Read More »

209

click to vote

MAICS
2004

219views Artificial Intelligence» more MAICS 2004»

Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research

15 years 8 months ago

Download cs.roosevelt.edu

This paper describes the current state of RUgle, a system for classifying and indexing papers made available on the World Wide Web, in a domain-independent and universal manner. B...

Eric G. Berkowitz, Mohamed Reda Elkhadiri

claim paper

Read More »

203

click to vote

NIPS
2000

155views Information Technology» more NIPS 2000»

The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity

15 years 8 months ago

Download www.cs.cmu.edu

We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...

David A. Cohn, Thomas Hofmann

claim paper

Read More »

« Prev « First page 27 / 60 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers