
13 years 11 months ago
A Discriminative Candidate Generator for String Transformations
String transformation, which maps a source string s into its desirable form t , is related to various applications including stemming, lemmatization, and spelling correction. The ...
Naoaki Okazaki, Yoshimasa Tsuruoka, Sophia Ananiad...
13 years 11 months ago
Topic-Driven Multi-Document Summarization with Encyclopedic Knowledge and Spreading Activation
Information of interest to users is often distributed over a set of documents. Users can specify their request for information as a query/topic
Vivi Nastase
13 years 11 months ago
Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model
Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
Lei Shi, Ming Zhou
13 years 11 months ago
Understanding the Value of Features for Coreference Resolution
In recent years there has been substantial work on the important problem of coreference resolution, most of which has concentrated on the development of new models and algorithmic...
Eric Bengtson, Dan Roth
13 years 11 months ago
Arabic Named Entity Recognition using Optimized Feature Sets
The Named Entity Recognition (NER) task has been garnering significant attention in NLP as it helps improve the performance of many natural language processing applications. In th...
Yassine Benajiba, Mona T. Diab, Paolo Rosso
13 years 11 months ago
Soft-Supervised Learning for Text Classification
We propose a new graph-based semisupervised learning (SSL) algorithm and demonstrate its application to document categorization. Each document is represented by a vertex within a ...
Amarnag Subramanya, Jeff Bilmes
13 years 11 months ago
Regular Expression Learning for Information Extraction
Regular expressions have served as the dominant workhorse of practical information extraction for several years. However, there has been little work on reducing the manual effort ...
Yunyao Li, Rajasekar Krishnamurthy, Sriram Raghava...
13 years 11 months ago
Joint Unsupervised Coreference Resolution with Markov Logic
Machine learning approaches to coreference resolution are typically supervised, and require expensive labeled data. Some unsupervised approaches have been proposed (e.g., Haghighi...
Hoifung Poon, Pedro Domingos
13 years 11 months ago
Graph-based Analysis of Semantic Drift in Espresso-like Bootstrapping Algorithms
Bootstrapping has a tendency, called semantic drift, to select instances unrelated to the seed instances as the iteration proceeds. We demonstrate the semantic drift of bootstrapp...
Mamoru Komachi, Taku Kudo, Masashi Shimbo, Yuji Ma...