Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

13 years 3 months ago

Download www.cs.sunysb.edu

Community-based knowledge forums, such as Wikipedia, are susceptible to vandalism, i.e., ill-intentioned contributions that are detrimental to the quality of collective intelligence. Most previous work to date relies on shallow lexico-syntactic patterns and metadata to automatically detect vandalism in Wikipedia. In this paper, we explore more linguistically motivated approaches to vandalism detection. In particular, we hypothesize that textual vandalism constitutes a unique genre where a group of people share a similar linguistic behavior. Experimental results suggest that (1) statistical models give evidence to unique language styles in vandalism, and that (2) deep syntactic patterns based on probabilistic context free grammars (PCFG) discriminate vandalism more effectively than shallow lexicosyntactic patterns based on n-grams.

Manoj Harpalani, Michael Hart, Sandesh Signh, Rob

Real-time Traffic

ACL 2011 | Collective Intelligence | Computational Linguistics | Language Styles | Syntactic Patterns |

claim paper

Post Info
More Details (n/a)

Added	24 Aug 2011
Updated	24 Aug 2011
Type	Journal
Year	2011
Where	ACL
Authors	Manoj Harpalani, Michael Hart, Sandesh Signh, Rob Johnson, Yejin Choi

Comments (0)

Sciweavers

Language of Vandalism: Improving Wikipedia Vandalism Detection via Stylometric Analysis

ACL 2011 | Collective Intelligence | Computational Linguistics | Language Styles | Syntactic Patterns |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers