Sciweavers

ACL
2010

Pseudo-Word for Phrase-Based Machine Translation

13 years 9 months ago
Pseudo-Word for Phrase-Based Machine Translation
The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. Using words as inputs to PBSMT pipeline has inborn deficiency. This paper proposes pseudo-word as a new start point for PB-SMT pipeline. Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain.
Xiangyu Duan, Min Zhang, Haizhou Li
Added 10 Feb 2011
Updated 10 Feb 2011
Type Journal
Year 2010
Where ACL
Authors Xiangyu Duan, Min Zhang, Haizhou Li
Comments (0)