Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora

15 years 8 months ago

Download acl.ldc.upenn.edu

We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processinginspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.

Dragos Stefan Munteanu, Daniel Marcu

Real-time Traffic

ACL 2006 | ACL 2007 | Parallel Sentence Pairs | Sentence Pairs | Similar Sentence Pairs |

claim paper

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2006
Where	ACL
Authors	Dragos Stefan Munteanu, Daniel Marcu

Sciweavers

Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora

ACL 2006 | ACL 2007 | Parallel Sentence Pairs | Sentence Pairs | Similar Sentence Pairs |

Explore & Download

Productivity Tools

Sciweavers