This paper proposes a novel framework for automatic text categorization problem based on the kernel density classifier. The overall goal is to tackle two main issues in automatic text categorization problems: the interpretability and the performance. Specifically, to solve the interpretability issue, the Latent Semantic Analysis technique is used to construct a topic space, in which each dimension represents a single topic. The text features are extracted directly from this topic space. To solve the performance issue, classifiers' parameters are optimized for either costsensitive or non-cost-sensitive categorization. We have experimentally evaluated the proposed framework by using a corpus of twenty newsgroups. The experimental results confirm the effectiveness of the framework to utilize the features from the topic model for cost-sensitive categorization.
Dwi Sianto Mansjur, Ted S. Wada, Biing-Hwang Juang