Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion

15 years 8 months ago

Download www.lrec-conf.org

Parallel text is one of the most valuable resources for development of statistical machine translation systems and other NLP applications. The Linguistic Data Consortium (LDC) has supported research on statistical machine translations and other NLP applications by creating and distributing a large amount of parallel text resources for the research communities. However, manual translations are very costly, and the number of known providers that offer complete parallel text is limited. This paper presents a cost effective approach to identify parallel document pairs from sources that provide potential parallel text

Kazuaki Maeda, Xiaoyi Ma, Stephanie Strassel

Real-time Traffic

Education | LREC 2008 | Parallel Text | Parallel Text Resources | Statistical Machine Translations |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	LREC
Authors	Kazuaki Maeda, Xiaoyi Ma, Stephanie Strassel

Comments (0)

Sciweavers

Creating Sentence-Aligned Parallel Text Corpora from a Large Archive of Potential Parallel Text using BITS and Champollion

Education | LREC 2008 | Parallel Text | Parallel Text Resources | Statistical Machine Translations |

Explore & Download

Productivity Tools

Sciweavers