Text Categorization with Class-Based and Corpus-Based Keyword Selection

14 years 7 months ago

Download www-personal.umich.edu

Abstract. In this paper, we examine the use of keywords in text categorization with SVM. In contrast to the usual belief, we reveal that using keywords instead of all words yields better performance both in terms of accuracy and time. Unlike the previous studies that focus on keyword selection metrics, we compare the two approaches for keyword selection. In corpus-based approach, a single set of keywords is selected for all classes. In class-based approach, a distinct set of keywords is selected for each class. We perform the experiments with the standard Reuters21578 dataset, with both boolean and tf-idf weighting. Our results show that although tf-idf weighting performs better, boolean weighting can be used where time and space resources are limited. Corpus-based approach with 2000 keywords performs the best. However, for small number of keywords, class-based approach outperforms the corpus-based approach with the same number of keywords.

Arzucan Özgür, Levent Özgür, T

Real-time Traffic

Class-based Approach | Corpus-based Approach | ISCIS 2005 | Keywords |

claim paper

Post Info
More Details (n/a)

Added	27 Jun 2010
Updated	27 Jun 2010
Type	Conference
Year	2005
Where	ISCIS
Authors	Arzucan Özgür, Levent Özgür, Tunga Güngör

Comments (0)

Sciweavers

Text Categorization with Class-Based and Corpus-Based Keyword Selection

Class-based Approach | Corpus-based Approach | ISCIS 2005 | Keywords |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers