Sciweavers

WWW
2005
ACM

Automatically learning document taxonomies for hierarchical classification

15 years 1 months ago
Automatically learning document taxonomies for hierarchical classification
While several hierarchical classification methods have been applied to web content, such techniques invariably rely on a pre-defined taxonomy of documents. We propose a new technique that extracts a suitable hierarchical structure automatically from a corpus of labeled documents. We show that our technique groups similar classes closer together in the tree and discovers relationships among documents that are not encoded in the class labels. The learned taxonomy is then used along with binary SVMs for multi-class classification. We demonstrate the efficacy of our approach by testing it on the 20-Newsgroup dataset. Categories and Subject Descriptors H.3.1 [Information Systems]: Content Analysis and Indexing; H.3.3 [Information Systems]: Information Search and Retrieval General Terms Algorithms, Experimentation, Performance Keywords Automatic taxonomy learning, Hierarchical classification
Kunal Punera, Suju Rajan, Joydeep Ghosh
Added 22 Nov 2009
Updated 22 Nov 2009
Type Conference
Year 2005
Where WWW
Authors Kunal Punera, Suju Rajan, Joydeep Ghosh
Comments (0)