Sciweavers

KDD
2003
ACM

Cross-training: learning probabilistic mappings between topics

14 years 12 months ago
Cross-training: learning probabilistic mappings between topics
Classification is a well-established operation in text mining. Given a set of labels A and a set DA of training documents tagged with these labels, a classifier learns to assign labels to unlabeled test documents. Suppose we also had available a different set of labels B, together with a set of documents DB marked with labels from B. If A and B have some semantic overlap, can the availability of DB help us build a better classifier for A, and vice versa? We answer this question in the affirmative by proposing cross-training: a new approach to semi-supervised learning in presence of multiple label sets. We give distributional and discriminative algorithms for cross-training and show, through extensive experiments, that cross-training can discover and exploit probabilistic relations between two taxonomies for more accurate classification. Categories and subject descriptors: I.2.6 [Artificial intelligence]: Learning; I.5.2 [Pattern Recognition]: Design Methodology - classifier design and...
Sunita Sarawagi, Soumen Chakrabarti, Shantanu Godb
Added 30 Nov 2009
Updated 30 Nov 2009
Type Conference
Year 2003
Where KDD
Authors Sunita Sarawagi, Soumen Chakrabarti, Shantanu Godbole
Comments (0)