robots exclusion protocol

159

WWW
2007
ACM

98views Internet Technology» more WWW 2007»

16 years 7 months ago

Download www2007.org

Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...

Yang Sun, Ziming Zhuang, C. Lee Giles

claim paper

Read More »

117

click to vote

WWW
2008
ACM

100views Internet Technology» more WWW 2008»

A larger scale study of robots.txt

16 years 7 months ago

Download www2008.org

A website can regulate search engine crawler access to its content using the robots exclusion protocol, specified in its robots.txt file. The rules in the protocol enable the site...

Santanu Kolay

claim paper

Read More »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers