Sciweavers

43 search results - page 2 / 9
» A Lightweight and Efficient Tool for Cleaning Web Pages
Sort
View
WEBI
2010
Springer
13 years 5 months ago
Lightweight Clustering Methods for Webspam Demotion
Abstract--To make sure they can quickly respond to a specific query, the main search engines have several mechanisms. One of them consists in ranking web pages according to their i...
Thomas Largillier, Sylvain Peyronnet
KDD
2008
ACM
135views Data Mining» more  KDD 2008»
14 years 7 months ago
DiMaC: a disguised missing data cleaning tool
In some applications such as filling in a customer information form on the web, some missing values may not be explicitly represented as such, but instead appear as potentially va...
Ming Hua, Jian Pei
CIKM
2006
Springer
13 years 11 months ago
A fast and robust method for web page template detection and removal
The widespread use of templates on the Web is considered harmful for two main reasons. Not only do they compromise the relevance judgment of many web IR and web mining methods suc...
Karane Vieira, Altigran Soares da Silva, Nick Pint...
LREC
2010
216views Education» more  LREC 2010»
13 years 8 months ago
BlogBuster: A Tool for Extracting Corpora from the Blogosphere
This paper presents BlogBuster, a tool for extracting a corpus from the blogosphere. The topic of cleaning arbitrary web pages with the goal of extracting a corpus from web data, ...
Georgios Petasis, Dimitrios Petasis
HT
2009
ACM
14 years 1 months ago
WebNC: efficient sharing of web applications
WebNC is a system for efficiently sharing, retrieving and viewing web applications. Unlike existing screencasting and screensharing tools, WebNC is optimized to work with web page...
Laurent Denoue, John Adcock, Scott Carter, Gene Go...