In a higher level task such as clustering of web results or word sense disambiguation, knowledge of all possible distinct concepts in which an ambiguous word can be expressed woul...
Web graphs are approximate snapshots of the web, created by search engines. Their creation is an error-prone procedure that relies on the availability of Internet nodes and the fa...
Panagiotis Papadimitriou 0002, Ali Dasdan, Hector ...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Classification of documents by genre is typically done either using linguistic analysis or term frequency based techniques. The former provides better classification accuracy than...
The high availability of video streams is making necessary mechanisms for indexing such contents in the Web world. In this paper we focus on news programs and we propose a mechani...