Sciweavers

IBPRIA
2005
Springer

Extended Bi-gram Features in Text Categorization

14 years 6 months ago
Extended Bi-gram Features in Text Categorization
Usually, in traditional text categorization systems based on Vector Space Model, there is no context information in a feature vector, which limited the performance of the system. To make use of more information, it is natural to select bi-gram feature in addition to unigram feature. However, the longer the feature is, the more important the feature selection algorithm is to get good balance in feature space This paper proposed two feature extraction methods which can get better feature balance for document categorization. Experiments show that our extended bi-gram feature improved system performance greatly.
Xian Zhang, Xiaoyan Zhu
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where IBPRIA
Authors Xian Zhang, Xiaoyan Zhu
Comments (0)