Effect of Feature Smoothing Methods in Text Classification Tasks

14 years 4 months ago

Download www-i6.informatik.rwth-aachen.de

Abstract. The number of features to be considered in a text classification system is given by the size of the vocabulary and this is normally in the range of the tens or hundreds of thousands even for small tasks. This leads to parameter estimation problems for statistical based methods and countermeasures have to be found. One of the most widely used methods consists of reducing the size of the vocabulary according to a well defined criterion in order to be able to reliably estimate the set of parameters. In the field of language modeling this problem is also encountered and several smoothing techniques have been developed. In this paper we show that using the full vocabulary together with a suitable choice of the smoothing technique for the text classification task obtains better results than the standard feature selection techniques. Key words: Text Classification, Naive Bayes, Multinomial Distribution, Feature Selection, Smoothing, Length Normalization

David Vilar, Hermann Ney, Alfons Juan, Enrique Vid

Real-time Traffic

PRIS 2004 | PRIS 2007 | Smoothing Technique | Statistical Based Methods | Text Classification |

claim paper

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2004
Where	PRIS
Authors	David Vilar, Hermann Ney, Alfons Juan, Enrique Vidal

Comments (0)

Sciweavers

Effect of Feature Smoothing Methods in Text Classification Tasks

PRIS 2004 | PRIS 2007 | Smoothing Technique | Statistical Based Methods | Text Classification |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers