Sciweavers

EMNLP
2004

A Boosting Algorithm for Classification of Semi-Structured Text

14 years 1 months ago
A Boosting Algorithm for Classification of Semi-Structured Text
The focus of research in text classification has expanded from simple topic identification to more challenging tasks such as opinion/modality identification. Unfortunately, the latter goals exceed the ability of the traditional bag-of-word representation approach, and a richer, more structural representation is required. Accordingly, learning algorithms must be created that can handle the structures observed in texts. In this paper, we propose a Boosting algorithm that captures sub-structures embedded in texts. The proposal consists of i) decision stumps that use subtrees as features and ii) the Boosting algorithm which employs the subtree-based decision stumps as weak learners. We also discuss the relation between our algorithm and SVMs with tree kernel. Two experiments on opinion/modality classification confirm that subtree features are important.
Taku Kudo, Yuji Matsumoto
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where EMNLP
Authors Taku Kudo, Yuji Matsumoto
Comments (0)