An efficient feature ranking measure for text categorization

14 years 22 hour ago

Download www.searchforum.org.cn

A major obstacle that decreases the performance of text classifiers is the extremely high dimensionality of text data. To reduce the dimension, a number of approaches based on rough-set theory have been proposed. However, these works often suffer from two problems: the first is that they cannot directly deal with continuous text features; the second is that they often incur considerable running time. To deal with the first issue, we make some extensions to discernibility matrix so that it can work with continuous features. To cut down running time, we employ centroids rather than examples to construct discernibility matrix, which reduce the time complexity from O(T2 W) to O(K2 W) where T denotes the size of training examples, K denotes the number of training classes and W denotes the size of vocabulary. The experimental results indicate that proposed method not only yields much higher accuracy than Information Gain when the number of selected features is smaller than 6000, but also in...

Songbo Tan, Yuefen Wang, Xueqi Cheng

Real-time Traffic

Applied Computing | Considerable Running Time | Information Gain | Major Obstacle | SAC 2008 |

claim paper

Post Info
More Details (n/a)

Added	28 Dec 2010
Updated	28 Dec 2010
Type	Journal
Year	2008
Where	SAC
Authors	Songbo Tan, Yuefen Wang, Xueqi Cheng

Comments (0)

Sciweavers

An efficient feature ranking measure for text categorization

Applied Computing | Considerable Running Time | Information Gain | Major Obstacle | SAC 2008 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers