Sciweavers

EMNLP
2010

Improving Gender Classification of Blog Authors

13 years 9 months ago
Improving Gender Classification of Blog Authors
The problem of automatically classifying the gender of a blog author has important applications in many commercial domains. Existing systems mainly use features such as words, word classes, and POS (part-ofspeech) n-grams, for classification learning. In this paper, we propose two new techniques to improve the current result. The first technique introduces a new class of features which are variable length POS sequence patterns mined from the training data using a sequence pattern mining algorithm. The second technique is a new feature selection method which is based on an ensemble of several feature selection criteria and approaches. Empirical evaluation using a real-life blog data set shows that these two techniques improve the classification accuracy of the current state-ofthe-art methods significantly.
Arjun Mukherjee, Bing Liu
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors Arjun Mukherjee, Bing Liu
Comments (0)