Sciweavers

664 search results - page 81 / 133
» The internet measurement data catalog
Sort
View
WWW
2007
ACM
14 years 8 months ago
A large-scale study of robots.txt
Search engines largely rely on Web robots to collect information from the Web. Due to the unregulated open-access nature of the Web, robot activities are extremely diverse. Such c...
Yang Sun, Ziming Zhuang, C. Lee Giles
WWW
2006
ACM
14 years 8 months ago
Visualizing tags over time
We consider the problem of visualizing the evolution of tags within the Flickr (flickr.com) online image sharing community. Any user of the Flickr service may append a tag to any ...
Micah Dubinko, Ravi Kumar, Joseph Magnani, Jasmine...
WWW
2006
ACM
14 years 8 months ago
XML screamer: an integrated approach to high performance XML parsing, validation and deserialization
This paper describes an experimental system in which customized high performance XML parsers are prepared using parser generation and compilation techniques. Parsing is integrated...
Margaret Gaitatzes Kostoulas, Morris Matsa, Noah M...
WWW
2006
ACM
14 years 8 months ago
Detecting spam web pages through content analysis
In this paper, we continue our investigations of "web spam": the injection of artificially-created pages into the web in order to influence the results from search engin...
Alexandros Ntoulas, Marc Najork, Mark Manasse, Den...
WWW
2006
ACM
14 years 8 months ago
Beyond PageRank: machine learning for static ranking
Since the publication of Brin and Page's paper on PageRank, many in the Web community have depended on PageRank for the static (query-independent) ordering of Web pages. We s...
Matthew Richardson, Amit Prakash, Eric Brill