Sciweavers

ACL
2008
13 years 8 months ago
Pairwise Document Similarity in Large Collections with MapReduce
This paper presents a MapReduce algorithm for computing pairwise document similarity in large document collections. MapReduce is an attractive framework because it allows us to de...
Tamer Elsayed, Jimmy J. Lin, Douglas W. Oard
HICSS
2002
IEEE
130views Biometrics» more  HICSS 2002»
13 years 11 months ago
A Novel Method for Detecting Similar Documents
We describe a system for rapidly determining document similarity among a set of documents obtained from an information retrieval (IR) system. We obtain a ranked list of the most i...
James W. Cooper, Anni Coden, Eric W. Brown
SIGIR
2003
ACM
14 years 1 days ago
An information-theoretic measure for document similarity
Recent work has demonstrated that the assessment of pairwise object similarity can be approached in an axiomatic manner using information theory. We extend this concept specifica...
Javed A. Aslam, Meredith Frost
AIIA
2005
Springer
14 years 10 days ago
A Semantic Kernel to Exploit Linguistic Knowledge
Abstract. Improving accuracy in Information Retrieval tasks via semantic information is a complex problem characterized by three main aspects: the document representation model, th...
Roberto Basili, Marco Cammisa, Alessandro Moschitt...
WWW
2003
ACM
14 years 7 months ago
Query-free news search
Many daily activities present information in the form of a stream of text, and often people can benefit from additional information on the topic discussed. TV broadcast news can b...
Monika Rauch Henzinger, Bay-Wei Chang, Brian Milch...