Predicting quality flaws in user-generated content: the case of wikipedia

13 years 9 months ago

Download www.uni-weimar.de

The detection and improvement of low-quality information is a key concern in Web applications that are based on user-generated content; a popular example is the online encyclopedia Wikipedia. Existing research on quality assessment of user-generated content deals with the classiﬁcation as to whether the content is high-quality or low-quality. This paper goes one step further: it targets the prediction of quality ﬂaws, this way providing speciﬁc indications in which respects low-quality content needs improvement. The prediction is based on user-deﬁned cleanup tags, which are commonly used in many Web applications to tag content that has some shortcomings. We apply this approach to the English Wikipedia, which is the largest and most popular user-generated knowledge source on the Web. We present an automatic mining approach to identify the existing cleanup tags, which provides us with a training corpus of labeled Wikipedia articles. We argue that common binary or multiclass clas...

Maik Anderka, Benno Stein, Nedim Lipka

Real-time Traffic

English Wikipedia | Information Technology | Online Encyclopedia | Prediction Performance | SIGIR 2012 |

claim paper

Post Info
More Details (n/a)

Added	28 Sep 2012
Updated	28 Sep 2012
Type	Journal
Year	2012
Where	SIGIR
Authors	Maik Anderka, Benno Stein, Nedim Lipka

Comments (0)

Sciweavers

Predicting quality flaws in user-generated content: the case of wikipedia

English Wikipedia | Information Technology | Online Encyclopedia | Prediction Performance | SIGIR 2012 |

Explore & Download

Productivity Tools

Sciweavers