A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
Certain distinctions made in the lexicon of one language may be redundant when translating into another language. We quantify redundancy among source types by the similarity of th...
Traditional 1-best translation pipelines suffer a major drawback: the errors of 1best outputs, inevitably introduced by each module, will propagate and accumulate along the pipeli...
Syntactic word reordering is essential for translations across different grammar structures between syntactically distant languagepairs. In this paper, we propose to embed local a...
The parameters of statistical translation models are typically estimated from sentence-aligned parallel corpora. We show that significant improvements in the alignment and transla...