We show that the automatically induced latent variable grammars of Petrov et al. (2006) vary widely in their underlying representations, depending on their EM initialization point...
This paper investigates cross-lingual textual entailment as a semantic relation between two text portions in different languages, and proposes a prospective research direction. We...
A variety of information extraction techniques rely on the fact that instances of the same relation are "distributionally similar," in that they tend to appear in simila...
We describe a class of translation model in which a set of input variants encoded as a context-free forest is translated using a finitestate translation model. The forest structur...
Understanding query ambiguity in web search remains an important open problem. In this paper we reexamine query ambiguity by analyzing the result clickthrough data. Previously pro...
We present three approaches for unsupervised grammar induction that are sensitive to data complexity and apply them to Klein and Manning's Dependency Model with Valence. The ...
Valentin I. Spitkovsky, Hiyan Alshawi, Daniel Jura...
Language identification is the task of identifying the language a given document is written in. This paper describes a detailed examination of what models perform best under diffe...
We address a key problem in grapheme-tophoneme conversion: the ambiguity in mapping grapheme units to phonemes. Rather than using single letters and phonemes as units, we propose ...
This paper investigates using prosodic information in the form of ToBI break indexes for parsing spontaneous speech. We revisit two previously studied approaches, one that hurt pa...
Named entity recognition systems sometimes have difficulty when applied to data from domains that do not closely match the training data. We first use a simple rule-based techniqu...
Asad B. Sayeed, Timothy J. Meyer, Hieu C. Nguyen, ...