We describe an approach to simultaneous tokenization and part-of-speech tagging that is based on separating the closed and open-class items, and focusing on the likelihood of the ...
Documents often have inherently parallel structure: they may consist of a text and ries, or an abstract and a body, or parts presenting alternative views on the same problem. Reve...
Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification t...
Developing features has been shown crucial to advancing the state-of-the-art in Semantic Role Labeling (SRL). To improve Chinese SRL, we propose a set of additional features, some...
One of the central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the docum...
Current Referring Expression Generation algorithms rely on domain dependent preferences for both content selection and linguistic realization. We present two experiments showing t...
We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach i...
This paper presents a novel filtration criteria to restrict the rule extraction for the hierarchical phrase-based translation model, where a bilingual but relaxed well-formed depe...
In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistic...
Matthias H. Heie, Edward W. D. Whittaker, Sadaoki ...