Combining labeled and unlabeled data with word-class distribution learning

16 years 1 months ago

Download www.cs.cmu.edu

We describe a novel simple and highly scalable semi-supervised method called Word-Class Distribution Learning (WCDL), and apply it the task of information extraction (IE) by utilizing unlabeled sentences to improve supervised classiﬁcation methods. WCDL iteratively builds class label distributions for each word in the dictionary by averaging predicted labels over all cases in the unlabeled corpus, and re-training a base classiﬁer adding these distributions as word features. In contrast, traditional self-training or cotraining methods add self-labeled examples (rather than features) which can degrade performance due to incestuous learning bias. WCDL exhibits robust behavior, and has no difﬁcult parameters to tune. We applied our method on German and English name entity recognition (NER) tasks. WCDL shows improvements over self-training, multi-task semi-supervision or supervision alone, in particular yielding a state-of-the art 75.72 F1 score on the German NER task. Categories and...

Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kav

Real-time Traffic

CIKM 2009 | Class Label Distributions | Database | Scalable Semi-supervised Method | Unlabeled Sentences |

claim paper

» Combining Labeled and Unlabeled Data for MultiClass Text Categorization

» A PACStyle Model for Learning from Labeled and Unlabeled Data

» Combining clustering and cotraining to enhance text classification using unlabelled data

» Efficient Learning by Combining ConfidenceRated Classifiers to Incorporate Unlabeled Medic...

» Learning to Classify Text from Labeled and Unlabeled Documents

» Does Unlabeled Data Provably Help Worstcase Analysis of the Sample Complexity of SemiSuper...

» SemiSupervised Sequence Labeling with SelfLearned Features

» Boosting Statistical Word Alignment Using Labeled and Unlabeled Data

Post Info
More Details (n/a)

Added	26 May 2010
Updated	26 May 2010
Type	Conference
Year	2009
Where	CIKM
Authors	Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, Jason Weston

Comments (0)

Sciweavers

Combining labeled and unlabeled data with word-class distribution learning

CIKM 2009 | Class Label Distributions | Database | Scalable Semi-supervised Method | Unlabeled Sentences |

Explore & Download

Productivity Tools

Sciweavers