A Re-Examination of Text Categorization Methods

14 years 3 months ago

Download boston.lti.cs.cmu.edu

This paper reports a controlled study with statistical signi cance tests on ve text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classi er, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classier. We focus on the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF signi cantly outperform NNet and NB when the number of positive training instances per category are small (less than ten), and that all the methods perform comparably when the categories are su ciently common (over 300 instances).

Yiming Yang, Xin Liu

Real-time Traffic

Information Management | SIGIR 1999 | Signi Cance Tests | Skewed Category Distribution | Training-set Category Frequency |

claim paper

Post Info
More Details (n/a)

Added	03 Aug 2010
Updated	03 Aug 2010
Type	Conference
Year	1999
Where	SIGIR
Authors	Yiming Yang, Xin Liu

Comments (0)

Sciweavers

A Re-Examination of Text Categorization Methods

Information Management | SIGIR 1999 | Signi Cance Tests | Skewed Category Distribution | Training-set Category Frequency |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers