Sciweavers

ACL
2006

Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora

14 years 27 days ago
Extracting Parallel Sub-Sentential Fragments from Non-Parallel Corpora
We present a novel method for extracting parallel sub-sentential fragments from comparable, non-parallel bilingual corpora. By analyzing potentially similar sentence pairs using a signal processinginspired approach, we detect which segments of the source sentence are translated into segments in the target sentence, and which are not. This method enables us to extract useful machine translation training data even from very non-parallel corpora, which contain no parallel sentence pairs. We evaluate the quality of the extracted data by showing that it improves the performance of a state-of-the-art statistical machine translation system.
Dragos Stefan Munteanu, Daniel Marcu
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where ACL
Authors Dragos Stefan Munteanu, Daniel Marcu
Comments (0)