Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

16 years 1 months ago

Download users.dsic.upv.es

Abstract. Automatic plagiarism detection considering a reference corpus compares a suspicious text to a set of original documents in order to relate the plagiarised fragments to their potential source. Publications on this task often assume that the search space (the set of reference documents) is a narrow set where any search strategy will produce a good output in a short time. However, this is not always true. Reference corpora are often composed of a big set of original documents where a simple exhaustive search strategy becomes practically impossible. Before carrying out an exhaustive search, it is necessary to reduce the search space, represented by the documents in the reference corpus, as much as possible. Our experiments with the METER corpus show that a previous search space reduction stage, based on the Kullback-Leibler symmetric distance, reduces the search process time dramatically. Additionally, it improves the Precision and Recall obtained by a search strategy based on th...

Alberto Barrón-Cedeño, Paolo Rosso,

Real-time Traffic

CICLING 2009 | Exhaustive Search Strategy | Natural Language Processing | Search Process Time | Search Space Reduction |

claim paper

Post Info
More Details (n/a)

Added	24 Nov 2009
Updated	24 Nov 2009
Type	Conference
Year	2009
Where	CICLING
Authors	Alberto Barrón-Cedeño, Paolo Rosso, José-Miguel Benedí

Comments (0)

Sciweavers

Reducing the Plagiarism Detection Search Space on the Basis of the Kullback-Leibler Distance

CICLING 2009 | Exhaustive Search Strategy | Natural Language Processing | Search Process Time | Search Space Reduction |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers