Using the Complexity of the Distribution of Lexical Elements as a Feature in Authorship Attribution

15 years 8 months ago

Download www.lrec-conf.org

Traditional Authorship Attribution models extract normalized counts of lexical elements such as nouns, common words and punctuation and use these normalized counts or ratios as features for author fingerprinting. The text is viewed as a "bag-of-words" and the order of words and their position relative to other words is largely ignored. We propose a new method of feature extraction which quantifies the distribution of lexical elements within the text using Kolmogorov complexity estimates. Testing carried out on blog corpora indicates that such measures outperform ratios when used as features in an SVM authorship attribution model. Moreover, by adding complexity

Leanne Spracklin, Diana Inkpen, Amiya Nayak

Real-time Traffic

Authorship Attribution Model | Education | Lexical Elements | LREC 2008 | Normalized Counts |

claim paper

» Saliency filters Contrast based filtering for salient region detection

» Sequence signature analysis of chromosome identity in three Drosophila species

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Leanne Spracklin, Diana Inkpen, Amiya Nayak

Comments (0)

Sciweavers

Using the Complexity of the Distribution of Lexical Elements as a Feature in Authorship Attribution

Authorship Attribution Model | Education | Lexical Elements | LREC 2008 | Normalized Counts |

Explore & Download

Productivity Tools

Sciweavers