The use of blogs to track and comment on real world (political, news, entertainment) events is growing. Similarly, as more individuals start relying on the Web as their primary in...
Traditional content-based e-mail spam filtering takes into account content of e-mail messages and apply machine learning techniques to infer patterns that discriminate spams from...
We present an algorithm, witch, that learns to detect spam hosts or pages on the Web. Unlike most other approaches, it simultaneously exploits the structure of the Web graph as we...
Jacob Abernethy, Olivier Chapelle, Carlos Castillo
What makes template content in the Web so special that we need to remove it? In this paper I present a large-scale aggregate analysis of textual Web content, corroborating statist...
In this paper we perform a study of the image contents of the Chilean web (.cl domain) using automatic feature extraction, content-based analysis and face detection algorithms. In...
Alejandro Jaimes, Javier Ruiz-del-Solar, Rodrigo V...