This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to de...
We describe a system for rapidly determining document similarity among a set of documents obtained from an information retrieval (IR) system. We obtain a ranked list of the most i...
Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifica...
Abstract. Improving accuracy in Information Retrieval tasks via semantic information is a complex problem characterized by three main aspects: the document representation model, th...
Roberto Basili, Marco Cammisa, Alessandro Moschitt...
Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can b...
Monika Rauch Henzinger, Bay-Wei Chang, Brian Milch...