Missing web pages, URIs that return the 404 “Page Not Found” error or the HTTP response code 200 but dereference unexpected content, are ubiquitous in today’s browsing exper...
Martin Klein, Jeffery L. Shipman, Michael L. Nelso...
While much research has been performed on query logs collected for major Web search engines, query log analysis to enhance search on smaller and more focused collections has attrac...
Stephen Dignum, Udo Kruschwitz, Maria Fasli, Yunhy...
In this paper, we present a novel method for the classification of Web sites. This method exploits both structure and content of Web sites in order to discern their functionality....
In this paper we address the problem of unsupervised Web data extraction. We show that unsupervised Web data extraction becomes feasible when supposing pages that are made up of r...
The Web is a vast, dynamic source of information and resources. Because of its size and diversity, it is increasingly likely that if the information one seeks is not already there...