Missing web pages, URIs that return the 404 “Page Not Found” error or the HTTP response code 200 but dereference unexpected content, are ubiquitous in today’s browsing exper...
Martin Klein, Jeffery L. Shipman, Michael L. Nelso...
World Wide Web search engines typically return thousands of results to the users. To avoid users browsing through the whole list of results, search engines use ranking algorithms ...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
With the exponential growth of Web 2.0 applications, tags have been used extensively to describe the image contents on the Web. Due to the noisy and sparse nature in the human gene...
Peer-To-Peer (P2P) networks like Gnutella improve some shortcomings of Conventional Search Engines (CSE) such as centralized and outdated indexing by distributing the search engin...