We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Evaluation of IR systems has always been difficult because of the need for manually assessed relevance judgments. The advent of large editor-driven taxonomies on the web opens the...
Steven M. Beitzel, Eric C. Jensen, Abdur Chowdhury...
Robert French has argued that a disembodied computer is incapable of passing a Turing Test that includes subcognitive questions. Subcognitive questions are designed to probe the n...
A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search...
—Sampling is used as a universal method to reduce the running time of computations – the computation is performed on a much smaller sample and then the result is scaled to comp...