The paper presents an approach to the task of automatic document categorization in the field of economics. Since the documents can be annotated with multiple keywords (labels), we ...
Abstract. Most common feature selection techniques for document categorization are supervised and require lots of training data in order to accurately capture the descriptive and d...
In this paper, we propose a new algorithm, which incorporates the relationships of concept-based thesauri into the document categorization using the k-NN classifier (k-NN). k-NN i...
This paper presents a new approach for automatic document categorization. Exploiting the logical structure of the document, our approach assigns a HTML document to one or more cate...
We study dimensionality reduction or feature selection in text document categorization problem. We focus on the first step in building text categorization systems, that is the cho...
Word fragments or n-grams have been widely used to perform different Natural Language Processing tasks such as information retrieval [1] [2], document categorization [3], automatic...
Traditionally, research in identifying structured entities in documents has proceeded independently of document categorization research. In this paper, we observe that these two t...