We present two machine learning approaches to information extraction from semi-structured documents that can be used if no annotated training data are available, but there does ex...
Seed sampling is critical in semi-supervised learning. This paper proposes a clusteringbased stratified seed sampling approach to semi-supervised learning. First, various clusteri...
We propose a bootstrapping approach to training a memoriless stochastic transducer for the task of extracting transliterations from an English-Arabic bitext. The transducer learns...
Binary semantic relation extraction from Wikipedia is particularly useful for various NLP and Web applications. Currently frequent pattern miningbased methods and syntactic analysi...
Traditionally, machine learning approaches for information extraction require human annotated data that can be costly and time-consuming to produce. However, in many cases, there ...