Sciweavers

IPM
2006

Hierarchical document categorization with k-NN and concept-based thesauri

14 years 12 days ago
Hierarchical document categorization with k-NN and concept-based thesauri
In this paper, we propose a new algorithm, which incorporates the relationships of concept-based thesauri into the document categorization using the k-NN classifier (k-NN). k-NN is one of the most popular document categorization methods because it shows relatively good performance in spite of its simplicity. However, it significantly degrades precision when ambiguity arises, i.e., when there exist more than one candidate category to which a document can be assigned. To remedy the drawback, we employ concept-based thesauri in the categorization. Employing the thesaurus entails structuring categories into hierarchies, since their structure needs to be conformed to that of the thesaurus for capturing relationships between categories. By referencing various relationships in the thesaurus corresponding to the structured categories, k-NN can be prominently improved, removing the ambiguity. In this paper, we first perform the document categorization by using k-NN and then employ the relation...
Sun Lee Bang, Jae Dong Yang, Hyung Jeong Yang
Added 13 Dec 2010
Updated 13 Dec 2010
Type Journal
Year 2006
Where IPM
Authors Sun Lee Bang, Jae Dong Yang, Hyung Jeong Yang
Comments (0)