Sciweavers

ISCIS
2005
Springer

Text Categorization with Class-Based and Corpus-Based Keyword Selection

14 years 4 months ago
Text Categorization with Class-Based and Corpus-Based Keyword Selection
Abstract. In this paper, we examine the use of keywords in text categorization with SVM. In contrast to the usual belief, we reveal that using keywords instead of all words yields better performance both in terms of accuracy and time. Unlike the previous studies that focus on keyword selection metrics, we compare the two approaches for keyword selection. In corpus-based approach, a single set of keywords is selected for all classes. In class-based approach, a distinct set of keywords is selected for each class. We perform the experiments with the standard Reuters21578 dataset, with both boolean and tf-idf weighting. Our results show that although tf-idf weighting performs better, boolean weighting can be used where time and space resources are limited. Corpus-based approach with 2000 keywords performs the best. However, for small number of keywords, class-based approach outperforms the corpus-based approach with the same number of keywords.
Arzucan Özgür, Levent Özgür, T
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where ISCIS
Authors Arzucan Özgür, Levent Özgür, Tunga Güngör
Comments (0)