We present three approaches for unsupervised grammar induction that are sensitive to data complexity and apply them to Klein and Manning's Dependency Model with Valence. The ...
Valentin I. Spitkovsky, Hiyan Alshawi, Daniel Jura...
Language identification is the task of identifying the language a given document is written in. This paper describes a detailed examination of what models perform best under diffe...
We address a key problem in grapheme-tophoneme conversion: the ambiguity in mapping grapheme units to phonemes. Rather than using single letters and phonemes as units, we propose ...
This paper investigates using prosodic information in the form of ToBI break indexes for parsing spontaneous speech. We revisit two previously studied approaches, one that hurt pa...
Named entity recognition systems sometimes have difficulty when applied to data from domains that do not closely match the training data. We first use a simple rule-based techniqu...
Asad B. Sayeed, Timothy J. Meyer, Hieu C. Nguyen, ...
We describe tree edit models for representing sequences of tree transformations involving complex reordering phenomena and demonstrate that they offer a simple, intuitive, and eff...
With the increase in popularity of online review sites comes a corresponding need for tools capable of extracting the information most important to the user from the plain text da...
This paper presents an empirical comparison of similarity measures for pairs of concepts based on Information Content. It shows that using modest amounts of untagged text to deriv...
The task of identifying the language of text or utterances has a number of applications in natural language processing. Language identification has traditionally been approached w...
The use of well-nested linear context-free rewriting systems has been empirically motivated for modeling of the syntax of languages with discontinuous constituents or relatively f...