Sciweavers

KDD
1999
ACM

On the Merits of Building Categorization Systems by Supervised Clustering

14 years 3 months ago
On the Merits of Building Categorization Systems by Supervised Clustering
This paper investigates the use of supervised clustering in order to create sets of categories for classi cation of documents. We use information from a pre-existing taxonomy in order to supervise the creation of a set of related clusters, though with some freedom in de ning and creating the classes. We show that the advantage of using supervised clustering is that it is possible to have some control over the range of subjects that one would like the categorization system to address, but with a precise mathematical de nition of each category. We then categorize documents using this a priori knowledge of the de nition of each category. We also discuss a new technique to help the classi er distinguish better among closely related clusters. Finally, we show empirically that this categorization system utilizing a machine-derived taxonomy performs as well as a manual categorization process, but at a far lower cost.
Charu C. Aggarwal, Stephen C. Gates, Philip S. Yu
Added 04 Aug 2010
Updated 04 Aug 2010
Type Conference
Year 1999
Where KDD
Authors Charu C. Aggarwal, Stephen C. Gates, Philip S. Yu
Comments (0)