Sciweavers

472 search results - page 28 / 95
» Crawling the Hidden Web
Sort
View
WWW
2007
ACM
14 years 8 months ago
A large-scale study of robots.txt
Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...
Yang Sun, Ziming Zhuang, C. Lee Giles
ICCS
2007
Springer
13 years 11 months ago
Estimating the Change of Web Pages
This paper presents the estimation methods computing the probabilities of how many times web pages are downloaded and modified, respectively, in the future crawls. The methods can ...
Sung Jin Kim, Sang Ho Lee
MAICS
2004
13 years 9 months ago
Creation of a Style Independent Intelligent Autonomous Citation Indexer to Support Academic Research
This paper describes the current state of RUgle, a system for classifying and indexing papers made available on the World Wide Web, in a domain-independent and universal manner. B...
Eric G. Berkowitz, Mohamed Reda Elkhadiri
NIPS
2000
13 years 9 months ago
The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
We describe a joint probabilistic model for modeling the contents and inter-connectivity of document collections such as sets of web pages or research paper archives. The model is...
David A. Cohn, Thomas Hofmann
ASUNAM
2009
IEEE
14 years 2 months ago
Prying Data out of a Social Network
—Preventing adversaries from compiling significant amounts of user data is a major challenge for social network operators. We examine the difficulty of collecting profile and ...
Joseph Bonneau, Jonathan Anderson, George Danezis