Sciweavers

LREC
2008

Using the Complexity of the Distribution of Lexical Elements as a Feature in Authorship Attribution

14 years 26 days ago
Using the Complexity of the Distribution of Lexical Elements as a Feature in Authorship Attribution
Traditional Authorship Attribution models extract normalized counts of lexical elements such as nouns, common words and punctuation and use these normalized counts or ratios as features for author fingerprinting. The text is viewed as a "bag-of-words" and the order of words and their position relative to other words is largely ignored. We propose a new method of feature extraction which quantifies the distribution of lexical elements within the text using Kolmogorov complexity estimates. Testing carried out on blog corpora indicates that such measures outperform ratios when used as features in an SVM authorship attribution model. Moreover, by adding complexity
Leanne Spracklin, Diana Inkpen, Amiya Nayak
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Leanne Spracklin, Diana Inkpen, Amiya Nayak
Comments (0)