Discretizing Continuous Attributes in AdaBoost for Text Categorization

15 years 8 months ago

Download nmis.isti.cnr.it

Abstract. We focus on two recently proposed algorithms in the family of “boosting”-based learners for automated text classiﬁcation, AdaBoost.MH and AdaBoost.MHKR . While the former is a realization of the well-known AdaBoost algorithm speciﬁcally aimed at multi-label text categorization, the latter is a generalization of the former based on the idea of learning a committee of classiﬁer sub-committees. Both algorithms have been among the best performers in text categorization experiments so far. A problem in the use of both algorithms is that they require documents to be represented by binary vectors, indicating presence or absence of the terms in the document. As a consequence, these algorithms cannot take full advantage of the “weighted” representations (consisting of vectors of continuous attributes) that are customary in information retrieval tasks, and that provide a much more signiﬁcant rendition of the document’s content than binary representations. In this pape...

Pio Nardiello, Fabrizio Sebastiani, Alessandro Spe

Real-time Traffic

Algorithms | Continuous Attributes | ECIR 2003 | Information Technology | Text Categorization |

claim paper

» Database Support for Probabilistic Attributes and Tuples

» Evaluating the performance of costbased discretization versus entropy and errorbased discr...

» Dimensionality Reduction via Genetic Value Clustering

» A flat direct model for speech recognition

Post Info
More Details (n/a)

Added	31 Oct 2010
Updated	31 Oct 2010
Type	Conference
Year	2003
Where	ECIR
Authors	Pio Nardiello, Fabrizio Sebastiani, Alessandro Sperduti

Comments (0)

Sciweavers

Discretizing Continuous Attributes in AdaBoost for Text Categorization

Algorithms | Continuous Attributes | ECIR 2003 | Information Technology | Text Categorization |

Explore & Download

Productivity Tools

Sciweavers