Sciweavers

SAC
2008
ACM

An efficient feature ranking measure for text categorization

13 years 11 months ago
An efficient feature ranking measure for text categorization
A major obstacle that decreases the performance of text classifiers is the extremely high dimensionality of text data. To reduce the dimension, a number of approaches based on rough-set theory have been proposed. However, these works often suffer from two problems: the first is that they cannot directly deal with continuous text features; the second is that they often incur considerable running time. To deal with the first issue, we make some extensions to discernibility matrix so that it can work with continuous features. To cut down running time, we employ centroids rather than examples to construct discernibility matrix, which reduce the time complexity from O(T2 W) to O(K2 W) where T denotes the size of training examples, K denotes the number of training classes and W denotes the size of vocabulary. The experimental results indicate that proposed method not only yields much higher accuracy than Information Gain when the number of selected features is smaller than 6000, but also in...
Songbo Tan, Yuefen Wang, Xueqi Cheng
Added 28 Dec 2010
Updated 28 Dec 2010
Type Journal
Year 2008
Where SAC
Authors Songbo Tan, Yuefen Wang, Xueqi Cheng
Comments (0)