Feature Engineering for Text Classification

14 years 5 months ago

Download gking.harvard.edu

Most research in text classification to date has used a “bag of words” representation in which each feature corresponds to a single word. This paper examines some alternative ways to represent text based on syntactic and semantic relationships between words (phrases, synonyms and hypernyms). We describe the new representations and try to justify our hypothesis that they could improve the performance of a rule-based learner. The representations are evaluated using the RIPPER learning algorithm on the Reuters-21578 and DigiTrad test corpora. On their own the new representations are not found to produce significant performance improvements. We also try combining classifiers based on different representations using a majority voting technique, and this improves performance on both test collections. In our opinion, more sophisticated Natural Language Processing techniques need to be developed before better text representations can be produced for classification.

Sam Scott, Stan Matwin

Real-time Traffic

Better Text Representations | DigiTrad Test Corpora | ICML 1999 | Machine Learning | RIPPER Learning Algorithm |

claim paper

Post Info
More Details (n/a)

Added	02 Aug 2010
Updated	02 Aug 2010
Type	Conference
Year	1999
Where	ICML
Authors	Sam Scott, Stan Matwin

Comments (0)

Sciweavers

Feature Engineering for Text Classification

Better Text Representations | DigiTrad Test Corpora | ICML 1999 | Machine Learning | RIPPER Learning Algorithm |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers