Using Clustering to Identify Outlier Chunks of Text - Notebook for PAN at CLEF 2011

14 years 6 months ago

Download www.uni-weimar.de

Intrinsic plagiarism detection is a sub-task of authorship identification in which outlier chunks must be detected solely on the basis of stylistic differences from the main body of the text. We present a first attempt at utilizing words that appear infrequently in a text as stylistic markers for distinguishing outlier chunks in the text. In the first phase of our method we cluster chunks of text represented by usage of infrequent words. In the second phase, we use a training corpus to identify cluster properties of outlier chunks.

Navot Akiva

Real-time Traffic

CLEF 2011 | Cluster Properties | Information Technology | Plagiarism Detection | Stylistic Differences |

claim paper

Post Info
More Details (n/a)

Added	18 Dec 2011
Updated	18 Dec 2011
Type	Journal
Year	2011
Where	CLEF
Authors	Navot Akiva

Comments (0)

Sciweavers

Using Clustering to Identify Outlier Chunks of Text - Notebook for PAN at CLEF 2011

CLEF 2011 | Cluster Properties | Information Technology | Plagiarism Detection | Stylistic Differences |

Explore & Download

Productivity Tools

Sciweavers