Pseudo-Word for Phrase-Based Machine Translation

14 years 1 months ago

Download www.aclweb.org

The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. Using words as inputs to PBSMT pipeline has inborn deficiency. This paper proposes pseudo-word as a new start point for PB-SMT pipeline. Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain.

Xiangyu Duan, Min Zhang, Haizhou Li

Real-time Traffic

ACL 2010 | Computational Linguistics | Non-compositional Phrasal Equivalences | Phrase-based Statistical Machine | Translation Domain |

claim paper

Post Info
More Details (n/a)

Added	10 Feb 2011
Updated	10 Feb 2011
Type	Journal
Year	2010
Where	ACL
Authors	Xiangyu Duan, Min Zhang, Haizhou Li

Comments (0)

Sciweavers

Pseudo-Word for Phrase-Based Machine Translation

ACL 2010 | Computational Linguistics | Non-compositional Phrasal Equivalences | Phrase-based Statistical Machine | Translation Domain |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers