Sciweavers

MICAI
2007
Springer

Taking Advantage of the Web for Text Classification with Imbalanced Classes

14 years 8 days ago
Taking Advantage of the Web for Text Classification with Imbalanced Classes
A problem of supervised approaches for text classification is that they commonly require high-quality training data to construct an accurate classifier. Unfortunately, in many real-world applications the training sets are extremely small and present imbalanced class distributions. In order to confront these problems, this paper proposes a novel approach for text classification that combines under-sampling with a semi-supervised learning method. In particular, the proposed semi-supervised method is specially suited to work with very few training examples and considers the automatic extraction of untagged data from the Web. Experimental results on a subset of Reuters-21578 text collection indicate that the proposed approach can be a practical solution for dealing with the class-imbalance problem, since it allows achieving very good results using very small training sets.
Rafael Guzmán-Cabrera, Manuel Montes-y-G&oa
Added 08 Jun 2010
Updated 08 Jun 2010
Type Conference
Year 2007
Where MICAI
Authors Rafael Guzmán-Cabrera, Manuel Montes-y-Gómez, Paolo Rosso, Luis Villaseñor Pineda
Comments (0)