On Authorship Attribution via Markov Chains and Sequence Kernels

16 years 7 months ago

Download eprints.pascal-network.org

We investigate the use of recently proposed character and word sequence kernels for the task of authorship attribution and compare their performance with two probabilistic approaches based on Markov chains of characters and words. Several configurations of the sequence kernels are studied using a relatively large dataset, where each author covered several topics. Utilising Moffat smoothing, the two probabilistic approaches obtain similar performance, which in turn is comparable to that of character sequence kernels and is better than that of word sequence kernels. The results further suggest that when using a realistic setup that takes into account the case of texts which are not written by any hypothesised authors, about 5000 reference words are required to obtain good discrimination performance.

Conrad Sanderson, Simon Günter

Real-time Traffic

Character Sequence Kernels | Computer Vision | ICPR 2006 | Probabilistic Approaches | Word Sequence Kernels |

claim paper

Added	09 Nov 2009
Updated	09 Nov 2009
Type	Conference
Year	2006
Where	ICPR
Authors	Conrad Sanderson, Simon Günter

Sciweavers

On Authorship Attribution via Markov Chains and Sequence Kernels

Character Sequence Kernels | Computer Vision | ICPR 2006 | Probabilistic Approaches | Word Sequence Kernels |

Explore & Download

Productivity Tools

Sciweavers