Abstract--To make sure they can quickly respond to a specific query, the main search engines have several mechanisms. One of them consists in ranking web pages according to their i...
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
The widespread use of templates on the Web is considered harmful for two main reasons. Not only do they compromise the relevance judgment of many web IR and web mining methods suc...
Karane Vieira, Altigran Soares da Silva, Nick Pint...
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
WebNC is a system for efficiently sharing, retrieving and viewing web applications. Unlike existing screencasting and screensharing tools, WebNC is optimized to work with web page...
Laurent Denoue, John Adcock, Scott Carter, Gene Go...