Sciweavers

AIRS
2004
Springer

Automatic Word Clustering for Text Categorization Using Global Information

14 years 4 months ago
Automatic Word Clustering for Text Categorization Using Global Information
This paper presents a cluster-based text categorization system which uses class distributional clustering of words. We propose a new clustering model which considers the global information over all the clusters. The model can group words into clusters based on the distribution of class labels associated with each word. Using these learned clusters as features, we develop a cluster-based classifier. We present several experimental results to show that our proposed method performs better than the other three text classifiers. The proposed model has better results than the model which only considers the information of the two related clusters. Specially, it can maintain good performance when the number of features is small and the size of training corpus is small. Categories and Subject Descriptors I.5.3 [PATTERN RECOGNITION]: Clustering—Similarity measures; I.5.4 [PATTERN RECOGNITION]: Application—Text processing; I.5.m [PATTERN RECOGNITION]: Miscellaneous General Terms Algorithms...
Wenliang Chen, Xingzhi Chang, Huizhen Wang, Jingbo
Added 30 Jun 2010
Updated 30 Jun 2010
Type Conference
Year 2004
Where AIRS
Authors Wenliang Chen, Xingzhi Chang, Huizhen Wang, Jingbo Zhu, Tianshun Yao
Comments (0)