Sciweavers

ICDM
2006
IEEE

Plagiarism Detection in arXiv

14 years 6 months ago
Plagiarism Detection in arXiv
We describe a large-scale application of methods for finding plagiarism and self-plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.
Daria Sorokina, Johannes Gehrke, Simeon Warner, Pa
Added 11 Jun 2010
Updated 11 Jun 2010
Type Conference
Year 2006
Where ICDM
Authors Daria Sorokina, Johannes Gehrke, Simeon Warner, Paul Ginsparg
Comments (0)