Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

199

ICDAR
1999
IEEE

118views Document Analysis» more ICDAR 1999»

Models and Algorithms for Duplicate Document Detection

15 years 11 months ago

Models and Algorithms for Duplicate Document Detection

Download www.cse.lehigh.edu

This paper introduces a framework for clarifying and formalizing the duplicate document detection problem. Four distinct models are presented, each with a corresponding algorithm for its solution derived from the realm of approximate string matching. The robustness of these techniques is demonstrated through a set of experiments using data reflecting real-world degradation effects.

Daniel P. Lopresti

Real-time Traffic

Approximate String Matching | Document Analysis | Duplicate Document Detection | ICDAR 1999 | Real-world Degradation Effects |

claim paper

Related Content

» DogmatiX Tracks down Duplicates in XML

» Detecting CoDerivative Documents in Large Text Collections

» Constructing a text corpus for inexact duplicate detection

» ProbClean A probabilistic duplicate detection system

» Distributed Text Retrieval From Overlapping Collections

» SpotSigs robust and efficient near duplicate detection in large web collections

» SDD high performance code clone detection system for large scale source code

» Next steps in nearduplicate detection for eRulemaking

» Efficient partialduplicate detection based on sequence matching

Post Info
More Details (n/a)

Added	03 Aug 2010
Updated	03 Aug 2010
Type	Conference
Year	1999
Where	ICDAR
Authors	Daniel P. Lopresti

Comments (0)