Sciweavers

SIGMOD
2003
ACM

Winnowing: Local Algorithms for Document Fingerprinting

14 years 11 months ago
Winnowing: Local Algorithms for Document Fingerprinting
Digital content is for copying: quotation, revision, plagiarism, and file sharing all create copies. Document fingerprinting is concerned with accurately identifying copying, including small partial copies, within large sets of documents. We introduce the class of local document fingerprinting algorithms, which seems to capture an essential property of any fingerprinting technique guaranteed to detect copies. We prove a novel lower bound on the performance of any local algorithm. We also develop winnowing, an efficient local fingerprinting algorithm, and show that winnowing's performance is within 33% of the lower bound. Finally, we also give experimental results on Web data, and report experience with MOSS, a widely-used plagiarism detection service.
Saul Schleimer, Daniel Shawcross Wilkerson, Alexan
Added 08 Dec 2009
Updated 08 Dec 2009
Type Conference
Year 2003
Where SIGMOD
Authors Saul Schleimer, Daniel Shawcross Wilkerson, Alexander Aiken
Comments (0)