Near Similarity Search and Plagiarism Analysis

14 years 5 months ago

Download www.uni-weimar.de

Abstract. Existing methods to text plagiarism analysis mainly base on “chunking”, a process of grouping a text into meaningful units each of which gets encoded by an integer number. Together theses numbers form a document’s signature or ﬁngerprint. An overlap of two documents’ ﬁngerprints indicate a possibly plagiarized text passage. Most approaches use MD5 hashes to construct ﬁngerprints, which is bound up with two problems: (i) it is computationally expensive, (ii) a small chunk size must be chosen to identify matching passages, which additionally increases the eﬀort for ﬁngerprint computation, ﬁngerprint comparison, and ﬁngerprint storage. This paper proposes a new class of ﬁngerprints that can be considered as an abstraction of the classical vector space model. These ﬁngerprints operationalize the concept of “near similarity” and enable one to quickly identify candidate passages for plagiarism. Experiments show that a plagiarism analysis based on our �...

Benno Stein, Sven Meyer zu Eissen

Real-time Traffic

Analysis Mainly Base | Analysis Plagiarism | GFKL 2005 | Plagiarism Analysis |

claim paper

Post Info
More Details (n/a)

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	GFKL
Authors	Benno Stein, Sven Meyer zu Eissen

Comments (0)

Sciweavers

Near Similarity Search and Plagiarism Analysis

Analysis Mainly Base | Analysis Plagiarism | GFKL 2005 | Plagiarism Analysis |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers