Sciweavers

EMNLP
2010

Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

13 years 9 months ago
Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification
Seed sampling is critical in semi-supervised learning. This paper proposes a clusteringbased stratified seed sampling approach to semi-supervised learning. First, various clustering algorithms are explored to partition the unlabeled instances into different strata with each stratum represented by a center. Then, diversity-motivated intra-stratum sampling is adopted to choose the center and additional instances from each stratum to form the unlabeled seed set for an oracle to annotate. Finally, the labeled seed set is fed into a bootstrapping procedure as the initial labeled data. We systematically evaluate our stratified bootstrapping approach in the semantic relation classification subtask of the ACE RDC (Relation Detection and Classification) task. In particular, we compare various clustering algorithms on the stratified bootstrapping performance. Experimental results on the ACE RDC 2004 corpus show that our clusteringbased stratified bootstrapping approach achieves the best F1-scor...
Longhua Qian, Guodong Zhou
Added 11 Feb 2011
Updated 11 Feb 2011
Type Journal
Year 2010
Where EMNLP
Authors Longhua Qian, Guodong Zhou
Comments (0)