Clustering and retrieval of web pages dominantly relies on analyzing either the content of individual web pages or the link structure between them. Some literature also suggests t...
The widespread use of templates on the Web is considered harmful for two main reasons. Not only do they compromise the relevance judgment of many web IR and web mining methods suc...
Karane Vieira, Altigran Soares da Silva, Nick Pint...
Identifying and tracking new information on the Web is important in sociology, marketing, and survey research, since new trends might be apparent in the new information. Such chan...
Cleaneval is a shared task and competitive evaluation on the topic of cleaning arbitrary web pages, with the goal of preparing web data for use as a corpus for linguistic and lang...
Marco Baroni, Francis Chantree, Adam Kilgarriff, S...
In this paper, an approach for the implementation of a qualitybased Web search engine is proposed. Quality retrieval is introduced and an overview on previous efforts to implement...