When does Co-training Work in Real Data?

14 years 7 months ago

Download www.csd.uwo.ca

Co-training, a paradigm of semi-supervised learning, may alleviate eﬀectively the data scarcity problem (i.e., the lack of labeled examples) in supervised learning. The standard two-view co-training requires the dataset be described by two views of attributes, and previous theoretical studies proved that if the two views satisfy the suﬃciency and independence assumptions, co-training is guaranteed to work well. However, little work has been done on how these assumptions can be empirically veriﬁed given datasets. In this paper, we ﬁrst propose novel approaches to verify empirically the two assumptions of co-training based on datasets. We then propose simple heuristic to split a single view of attributes into two views, and discover regularity on the suﬃciency and independence thresholds for the standard two-view co-training to work well. Our empirical results not only coincide well with the previous theoretical ﬁndings, but also provide a practical guideline to decide when c...

Charles X. Ling, Jun Du, Zhi-Hua Zhou

Real-time Traffic

Co-training | Data Mining | Data Scarcity Problem | PAKDD 2009 | Standard Two-view Co-training |

claim paper

Post Info
More Details (n/a)

Added	20 May 2010
Updated	20 May 2010
Type	Conference
Year	2009
Where	PAKDD
Authors	Charles X. Ling, Jun Du, Zhi-Hua Zhou

Comments (0)

Sciweavers

When does Co-training Work in Real Data?

Co-training | Data Mining | Data Scarcity Problem | PAKDD 2009 | Standard Two-view Co-training |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers