Web spam detection has become one of the top challenges for the Internet search industry. Instead of using some heuristic rules, we propose a feature re-extraction strategy to opt...
To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), we are building tools to build informatio...
Like search engines, recommender systems have become a tool that cannot be ignored by websites with a large selection of products, music, news or simply webpages links. The perform...
In this paper, we investigate how modeling content structure can benefit text analysis applications such as extractive summarization and sentiment analysis. This follows the lingu...
Motivated by our work with political scientists who need to manually analyze large Web archives of news sites, we present SpotSigs, a new algorithm for extracting and matching sig...
Martin Theobald, Jonathan Siddharth, Andreas Paepc...