Sciweavers

EMNLP
2007

Large Language Models in Machine Translation

14 years 1 months ago
Large Language Models in Machine Translation
This paper reports on the benefits of largescale statistical language modeling in machine translation. A distributed infrastructure is proposed which we use to train on up to 2 trillion tokens, resulting in language models having up to 300 billion n-grams. It is capable of providing smoothed probabilities for fast, single-pass decoding. We introduce a new smoothing method, dubbed Stupid Backoff, that is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases.
Thorsten Brants, Ashok C. Popat, Peng Xu, Franz Jo
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where EMNLP
Authors Thorsten Brants, Ashok C. Popat, Peng Xu, Franz Josef Och, Jeffrey Dean
Comments (0)