Sciweavers

SIGIR
2010
ACM

Combining coregularization and consensus-based self-training for multilingual text categorization

14 years 3 months ago
Combining coregularization and consensus-based self-training for multilingual text categorization
We investigate the problem of learning document classifiers in a multilingual setting, from collections where labels are only partially available. We address this problem in the framework of multiview learning, where different languages correspond to different views of the same document, combined with semi-supervised learning in order to benefit from unlabeled documents. We rely on two techniques, coregularization and consensus-based self-training, that combine multiview and semi-supervised learning in different ways. Our approach trains different monolingual classifiers on each of the views, such that the classifiers’ decisions over a set of unlabeled examples are in agreement as much as possible, and iteratively labels new examples from another unlabeled training set based on a consensus across language-specific classifiers. We derive a boosting-based training algorithm for this task, and analyze the impact of the number of views on the semi-supervised learning results o...
Massih-Reza Amini, Cyril Goutte, Nicolas Usunier
Added 24 Aug 2010
Updated 24 Aug 2010
Type Conference
Year 2010
Where SIGIR
Authors Massih-Reza Amini, Cyril Goutte, Nicolas Usunier
Comments (0)