The AMI Meeting Corpus is now publicly available, including manual annotation files generated in the NXT XML format, but lacking explicit metadata for the 171 meetings of the cor...
We propose a succinct randomized language model which employs a perfect hash function to encode fingerprints of n-grams and their associated probabilities, backoff weights, or oth...
This paper presents a comparative study of five parameter estimation algorithms on four NLP tasks. Three of the five algorithms are well-known in the computational linguistics com...
Jianfeng Gao, Galen Andrew, Mark Johnson, Kristina...
Among syntax-based translation models, the tree-based approach, which takes as input a parse tree of the source sentence, is a promising direction being faster and simpler than it...
We address two problems in the field of automatic optimization of dialogue strategies: learning effective dialogue strategies when no initial data or system exists, and evaluating...
We apply the hypothesis of "One Sense Per Discourse" (Yarowsky, 1995) to information extraction (IE), and extend the scope of "discourse" from one single docum...
Dependency-based representations of natural language syntax require a fine balance between structural flexibility and computational complexity. In previous work, several constra...
We present a method to transliterate names in the framework of end-to-end statistical machine translation. The system is trained to learn when to transliterate. For Arabic to Engl...