Sciweavers

ECIR
2003
Springer

Discretizing Continuous Attributes in AdaBoost for Text Categorization

14 years 27 days ago
Discretizing Continuous Attributes in AdaBoost for Text Categorization
Abstract. We focus on two recently proposed algorithms in the family of “boosting”-based learners for automated text classification, AdaBoost.MH and AdaBoost.MHKR . While the former is a realization of the well-known AdaBoost algorithm specifically aimed at multi-label text categorization, the latter is a generalization of the former based on the idea of learning a committee of classifier sub-committees. Both algorithms have been among the best performers in text categorization experiments so far. A problem in the use of both algorithms is that they require documents to be represented by binary vectors, indicating presence or absence of the terms in the document. As a consequence, these algorithms cannot take full advantage of the “weighted” representations (consisting of vectors of continuous attributes) that are customary in information retrieval tasks, and that provide a much more significant rendition of the document’s content than binary representations. In this pape...
Pio Nardiello, Fabrizio Sebastiani, Alessandro Spe
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where ECIR
Authors Pio Nardiello, Fabrizio Sebastiani, Alessandro Sperduti
Comments (0)