Sciweavers

19 search results - page 1 / 4
» Effective web-scale crawling through website analysis
Sort
View
WWW
2006
ACM
14 years 9 months ago
Effective web-scale crawling through website analysis
The web crawler space is often delimited into two general areas: full-web crawling and focused crawling. We present netSifter, a crawler system which integrates features from thes...
Iván Gonzlez, Adam Marcus 0002, Daniel N. M...
WWW
2007
ACM
14 years 9 months ago
A large-scale study of robots.txt
Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...
Yang Sun, Ziming Zhuang, C. Lee Giles
EMNLP
2009
13 years 6 months ago
Web-Scale Distributional Similarity and Entity Set Expansion
Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly...
Patrick Pantel, Eric Crestan, Arkady Borkovsky, An...
SIGIR
2006
ACM
14 years 2 months ago
AggregateRank: bringing order to web sites
Since the website is one of the most important organizational structures of the Web, how to effectively rank websites has been essential to many Web applications, such as Web sear...
Guang Feng, Tie-Yan Liu, Ying Wang, Ying Bao, Zhim...
ASSETS
2004
ACM
14 years 1 months ago
Accessibility of Internet websites through time
Using Internet Archive’s Wayback Machine, a random sample of websites from 1997-2002 were retrospectively analyzed for effects that technology has on accessibility for persons w...
Stephanie Hackett, Bambang Parmanto, Xiaoming Zeng