Extended Bi-gram Features in Text Categorization

16 years 16 days ago

Download www.csai.tsinghua.edu.cn

Usually, in traditional text categorization systems based on Vector Space Model, there is no context information in a feature vector, which limited the performance of the system. To make use of more information, it is natural to select bi-gram feature in addition to unigram feature. However, the longer the feature is, the more important the feature selection algorithm is to get good balance in feature space This paper proposed two feature extraction methods which can get better feature balance for document categorization. Experiments show that our extended bi-gram feature improved system performance greatly.

Xian Zhang, Xiaoyan Zhu

Real-time Traffic

Better Feature Balance | Bi-gram Feature | Feature Vector | IBPRIA 2005 | Image Analysis |

claim paper

» Web taxonomy integration with hierarchical shrinkage algorithm and finegrained relations

» Using discriminant analysis for multiclass classification an experimental investigation

Post Info
More Details (n/a)

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	IBPRIA
Authors	Xian Zhang, Xiaoyan Zhu

Comments (0)

Sciweavers

Extended Bi-gram Features in Text Categorization

Better Feature Balance | Bi-gram Feature | Feature Vector | IBPRIA 2005 | Image Analysis |

Explore & Download

Productivity Tools

Sciweavers