Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction

15 years 5 months ago

Download www.lrec-conf.org

This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually translated from Chinese to English. The parallel corpus contains 104 563 Chinese characters equivalent to 59 918 Chinese words, and the corresponding English corpus contains 75 766 English words. However Chinese writing does not utilize any delimiters to mark word boundaries so we had to carry out word segmentation as a preprocessing step on the Chinese corpus. Moreover since the parallel corpus is downloaded from Internet the corpus is noisy regarding to alignment between corresponding translated sentences. Therefore we used 60 hours of manually work to align the sentences in the English and Chinese...

Hercules Dalianis, Hao-chun Xing, Xin Zhang

Real-time Traffic

Education | English-chinese Parallel Corpus | English-Chinese Word List | LREC 2010 | Parallel Corpus |

claim paper

Post Info
More Details (n/a)

Added	29 Jan 2011
Updated	29 Jan 2011
Type	Journal
Year	2010
Where	LREC
Authors	Hercules Dalianis, Hao-chun Xing, Xin Zhang

Comments (0)

Sciweavers

Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction

Education | English-chinese Parallel Corpus | English-Chinese Word List | LREC 2010 | Parallel Corpus |

Explore & Download

Productivity Tools

Sciweavers