

ZOT! to Wikipedia Vandalism - Lab Report for PAN at CLEF 2010

14 years 2 months ago
ZOT! to Wikipedia Vandalism - Lab Report for PAN at CLEF 2010
Abstract This vandalism detector uses features primarily derived from a wordpreserving differencing of the text for each Wikipedia article from before and after the edit, along with a few metadata features and statistics on the before and after text. Features computed from the text difference are then a combination of statistics such as length, markup count, and blanking along with a selected number of TFIDF values for words and bigrams. Our training set was expanded from that supplied for the shared task to include the 5K vandalism edit corpus from West et al. Vandalism edits in the training set that were classified as "regular" by a classifier trained on all the data were removed from the training set used for the final classifier. Classification was performed using bagging of the Weka J48graft (C4.5) decision tree [3] which resulted in an evaluation score of 0.84340 AUC. It is unclear whether the expanded vandalism data improved or degraded performance because that changed...
James White, Rebecca Maessen
Added 08 Nov 2010
Updated 08 Nov 2010
Type Conference
Year 2010
Where CLEF
Authors James White, Rebecca Maessen
Comments (0)