Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

225

ICCPOL
2009
Springer

216views Natural Language Processing» more ICCPOL 2009»

Constructing Parallel Corpus from Movie Subtitles

15 years 11 months ago

Constructing Parallel Corpus from Movie Subtitles

Download home.in.tum.de

Abstract. This paper describes a methodology for constructing aligned German-Chinese corpora from movie subtitles. The corpora will be used to train a special machine translation system with intention to automatically translate the subtitles between German and Chinese. Since the common length-based algorithm for alignment shows weakness on short spoken sentences, especially on those from different language families, this paper studies to use dynamic programming based on time-shift information in subtitles, and extends it with statistical lexical cues to align the subtitle. In our experiment with around 4,000 Chinese and German sentences, the proposed alignment approach yields 83.8% precision. Furthermore, it is unrelated to languages, and leads to a general method of parallel corpora building between different language families.

Han Xiao, Xiaojie Wang

Real-time Traffic

Aligned German-chinese Corpora | Alignment Shows Weakness | ICCPOL 2009 | Parallel Corpora Building |

claim paper

Related Content

» Using Movie Subtitles for Creating a LargeScale Bilingual Corpora

» Synchronizing Translated Movie Subtitles

» Evaluating Utility of Data Sources in a Large Parallel CzechEnglish Corpus CzEng 09

» Creating a Reusable EnglishChinese Parallel Corpus for Bilingual Dictionary Construction

» Constructing the CODA Corpus A Parallel Corpus of Monologues and Expository Dialogues

» Parallel Massive Processing in SuperMatrix a General Tool for Distributional Semantic Ana...

» Generating Expository Dialogue from Monologue Motivation Corpus and Preliminary Rules

» Discovering Parallel Text from the World Wide Web

» Xoom a tool for zooming in and out of XML documents

Post Info
More Details (n/a)

Added	25 Jul 2010
Updated	25 Jul 2010
Type	Conference
Year	2009
Where	ICCPOL
Authors	Han Xiao, Xiaojie Wang

Comments (0)