Clustering by document concepts is a powerful way of retrieving information from a large number of documents. This task in general does not make any assumption on the data distrib...
generally meta-data, so that documents on any specific subject can be transparently retrieved. While quality control can in principle still rely on the traditional methods of peer-...
The document-length normalization problem has been widely studied in the field of Information Retrieval. The Cosine Normalization [2], the Maximum tf Normalization [1] and the By...
Sylvain Lamprier, Tassadit Amghar, Bernard Levrat,...
In this paper we study duplicates on the Web, using collections containing documents of all sites under the .cl domain that represent accurate and representative subsets of the We...
We simulate different architectures of a distributed Information Retrieval system on a very large Web collection, in order to work out the optimal setting for a particular set of r...