We formulate the problem of nonprojective dependency parsing as a polynomial-sized integer linear program. Our formulation is able to handle non-local output features in an effici...
Tree-based translation models, which exploit the linguistic syntax of source language, usually separate decoding into two steps: parsing and translation. Although this separation ...
Unrehearsed spoken language often contains disfluencies. In order to correctly interpret a spoken utterance, any such disfluencies must be identified and removed or otherwise deal...
In this paper we look at the problem of cleansing noisy text using a statistical machine translation model. Noisy text is produced in informal communications such as Short Message...
Danish Contractor, Tanveer A. Faruquie, L. Venkata...
We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects...