Given Bilingual Terminology in Statistical Machine Translation: MWE-Sensitve Word Alignment and Hierarchical Pitman-Yor Process-

14 years 10 months ago

Download www.cngl.ie

This paper considers a scenario when we are given almost perfect knowledge about bilingual terminology in terms of a test corpus in Statistical Machine Translation (SMT). When the given terminology is part of a training corpus, one natural strategy in SMT is to use the trained translation model ignoring the given terminology. Then, two questions arises here. 1) Can a word aligner capture the given terminology? This is since even if the terminology is in a training corpus, it is often the case that a resulted translation model may not include these terminology. 2) Are probabilities in a translation model correctly calculated? In order to answer these questions, we did experiment introducing a Multi-Word Expression-sensitive (MWEsensitive) word aligner and a hierarchical Pitman-Yor process-based translation model smoothing. Using 200k JP–EN NTCIR corpus, our experimental results show that if we introduce an MWE-sensitive word aligner and a new translation model smoothing, the

Tsuyoshi Okita, Andy Way

Real-time Traffic

Artificial Intelligence | FLAIRS 2011 | Natural Strategy | Statistical Machine Translation | Translation Model |

claim paper

Added	28 Aug 2011
Updated	28 Aug 2011
Type	Journal
Year	2011
Where	FLAIRS
Authors	Tsuyoshi Okita, Andy Way

Sciweavers

Given Bilingual Terminology in Statistical Machine Translation: MWE-Sensitve Word Alignment and Hierarchical Pitman-Yor Process-

Artificial Intelligence | FLAIRS 2011 | Natural Strategy | Statistical Machine Translation | Translation Model |

Explore & Download

Productivity Tools

Sciweavers