
13 years 4 months ago
Products of Random Latent Variable Grammars
We show that the automatically induced latent variable grammars of Petrov et al. (2006) vary widely in their underlying representations, depending on their EM initialization point...
Slav Petrov
13 years 4 months ago
Towards Cross-Lingual Textual Entailment
This paper investigates cross-lingual textual entailment as a semantic relation between two text portions in different languages, and proposes a prospective research direction. We...
Yashar Mehdad, Matteo Negri, Marcello Federico
13 years 4 months ago
Improved Extraction Assessment through Better Language Models
A variety of information extraction techniques rely on the fact that instances of the same relation are "distributionally similar," in that they tend to appear in simila...
Arun Ahuja, Doug Downey
13 years 4 months ago
Context-free reordering, finite-state translation
We describe a class of translation model in which a set of input variants encoded as a context-free forest is translated using a finitestate translation model. The forest structur...
Christopher Dyer, Philip Resnik
13 years 4 months ago
Query Ambiguity Revisited: Clickthrough Measures for Distinguishing Informational and Ambiguous Queries
Understanding query ambiguity in web search remains an important open problem. In this paper we reexamine query ambiguity by analyzing the result clickthrough data. Previously pro...
Yu Wang, Eugene Agichtein
13 years 4 months ago
From Baby Steps to Leapfrog: How "Less is More" in Unsupervised Dependency Parsing
We present three approaches for unsupervised grammar induction that are sensitive to data complexity and apply them to Klein and Manning's Dependency Model with Valence. The ...
Valentin I. Spitkovsky, Hiyan Alshawi, Daniel Jura...
13 years 4 months ago
Language Identification: The Long and the Short of the Matter
Language identification is the task of identifying the language a given document is written in. This paper describes a detailed examination of what models perform best under diffe...
Timothy Baldwin, Marco Lui
13 years 4 months ago
An MDL-based approach to extracting subword units for grapheme-to-phoneme conversion
We address a key problem in grapheme-tophoneme conversion: the ambiguity in mapping grapheme units to phonemes. Rather than using single letters and phonemes as units, we propose ...
Sravana Reddy, John A. Goldsmith
13 years 4 months ago
Appropriately Handled Prosodic Breaks Help PCFG Parsing
This paper investigates using prosodic information in the form of ToBI break indexes for parsing spontaneous speech. We revisit two previously studied approaches, one that hurt pa...
Zhongqiang Huang, Mary P. Harper
13 years 4 months ago
Crowdsourcing the evaluation of a domain-adapted named entity recognition system
Named entity recognition systems sometimes have difficulty when applied to data from domains that do not closely match the training data. We first use a simple rule-based techniqu...
Asad B. Sayeed, Timothy J. Meyer, Hieu C. Nguyen, ...