ZOT! to Wikipedia Vandalism - Lab Report for PAN at CLEF 2010

14 years 2 months ago

Download clef2010.org

Abstract This vandalism detector uses features primarily derived from a wordpreserving differencing of the text for each Wikipedia article from before and after the edit, along with a few metadata features and statistics on the before and after text. Features computed from the text difference are then a combination of statistics such as length, markup count, and blanking along with a selected number of TFIDF values for words and bigrams. Our training set was expanded from that supplied for the shared task to include the 5K vandalism edit corpus from West et al. Vandalism edits in the training set that were classified as "regular" by a classifier trained on all the data were removed from the training set used for the final classifier. Classification was performed using bagging of the Weka J48graft (C4.5) decision tree [3] which resulted in an evaluation score of 0.84340 AUC. It is unclear whether the expanded vandalism data improved or degraded performance because that changed...

James White, Rebecca Maessen

Real-time Traffic

CLEF 2010 | Information Technology | Training Set | Vandalism Detector | Vandalism Edits |

claim paper

Post Info
More Details (n/a)

Added	08 Nov 2010
Updated	08 Nov 2010
Type	Conference
Year	2010
Where	CLEF
Authors	James White, Rebecca Maessen

Comments (0)

Sciweavers

ZOT! to Wikipedia Vandalism - Lab Report for PAN at CLEF 2010

CLEF 2010 | Information Technology | Training Set | Vandalism Detector | Vandalism Edits |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers