A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010

15 years 3 months ago

Download www.uni-weimar.de

In this paper we present an approach to detect external plagiarism based on textual similarity. This is an efficient and precise method that can be applied over large sets of documents. The system that we have developed contains a first phase of document selection that uses a variant of tf-idf applied over the terms that appear in the two documents of the pair being compared. After this is done, we apply a more complex and accurate function based on character n-grams over the subset of documents resulting from the first step in order to extract the plagiarized passages, or matches. Once all matches for a given document are extracted, we perform a greedy match merging operation to allow in-between text in order to be compatible with certain levels of plagiarism obfuscation. In our participation in the 2nd International Competition on Plagiarism Detection, we achieved an overall score of 0.2222, ranking 11th out of 18 participants.

Daniel Micol, Óscar Ferrández, Ferna

Real-time Traffic

CLEF 2010 | Document | External Plagiarism | Information Technology | Plagiarism |

claim paper

Post Info
More Details (n/a)

Added	08 Nov 2010
Updated	08 Nov 2010
Type	Conference
Year	2010
Where	CLEF
Authors	Daniel Micol, Óscar Ferrández, Fernando Llopis, Rafael Muñoz

Comments (0)

Sciweavers

A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010

CLEF 2010 | Document | External Plagiarism | Information Technology | Plagiarism |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers