Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base

15 years 1 months ago

Download www.lsi.upc.edu

This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarka...

Miguel García, Jesús Giménez,

Real-time Traffic

CICLING 2009 | Natural Language Processing | Original Translation Model | Statistical Machine Translation | Translation Probabilities |

claim paper

Post Info
More Details (n/a)

Added	24 Nov 2009
Updated	24 Nov 2009
Type	Conference
Year	2009
Where	CICLING
Authors	Miguel García, Jesús Giménez, Lluís Màrquez

Comments (0)

Sciweavers

Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base

CICLING 2009 | Natural Language Processing | Original Translation Model | Statistical Machine Translation | Translation Probabilities |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers