Intra-document structural frequency features for semi-supervised domain adaptation

15 years 8 months ago

Download www.cs.cmu.edu

In this work we try to bridge the gap often encountered by researchers who find themselves with few or no labeled examples from their desired target domain, yet still have access to large amounts of labeled data from other related, but distinct source domains, and seemingly no way to transfer knowledge from one to the other. Experimentally, we focus on the problem of extracting protein mentions from academic publications in the field of biology, where the source ata are abstracts labeled with protein mentions, and the target domain data are wholly unlabeled captions. We mine the large number of such full text articles freely available on the Internet in order to supplement the limited amount of annotated data available. By exploiting the explicit and implicit common structure of the different subsections of these documents, including the unlabeled full text, we are able to generate robust features that are insensitive to changes in marginal and conditional distributions of classes and...

Andrew Arnold, William W. Cohen

Real-time Traffic

CIKM 2008 | Information Management | Protein Mentions | Target Domain | Target Domain Data |

claim paper

» A new structure with spectrumtuning of residual noise for active noise control

» Basis Expansion Model and Doppler Diversity Techniques for Frequency Domain Channel Estima...

Post Info
More Details (n/a)

Added	12 Oct 2010
Updated	12 Oct 2010
Type	Conference
Year	2008
Where	CIKM
Authors	Andrew Arnold, William W. Cohen

Comments (0)

Sciweavers

Intra-document structural frequency features for semi-supervised domain adaptation

CIKM 2008 | Information Management | Protein Mentions | Target Domain | Target Domain Data |

Explore & Download

Productivity Tools

Sciweavers