Mining Parenthetical Translations from the Web by Word Alignment

15 years 8 months ago

Download www.aclweb.org

Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extract such translations from a large collection of web documents by building a partially parallel corpus and use a word alignment algorithm to identify the terms being translated. The method is able to generalize across the translations for different terms and can reliably extract translations that occurred only once in the entire web. Our experiment on Chinese web pages produced more than 26 million pairs of translations, which is over two orders of magnitude more than previous results. We show that the addition of the extracted translation pairs as training data provides significant increase in the BLEU score for a statistical machine translation system.

Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Mari

Real-time Traffic

ACL 2008 | Annotate Terms | Computational Linguistics | Translations | Word Alignment Algorithm |

claim paper

» Mining Bilingual Data from the Web with Adaptively Learnt Patterns

» Clickthroughbased translation models for web search from word models to phrase models

» A DOM Tree Alignment Model for Mining Parallel Data from the Web

» Improved Sentence Alignment on Parallel Web Pages Using a Stochastic Tree Alignment Model

» An Empirical Study on Web Mining of Parallel Data

» Creating a Reusable EnglishChinese Parallel Corpus for Bilingual Dictionary Construction

» CrossMedia Alignment of Names and Faces

» Automatically Harvesting KatakanaEnglish Term Pairs from Search Engine Query Logs

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	ACL
Authors	Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Marius Pasca

Comments (0)

Sciweavers

Mining Parenthetical Translations from the Web by Word Alignment

ACL 2008 | Annotate Terms | Computational Linguistics | Translations | Word Alignment Algorithm |

Explore & Download

Productivity Tools

Sciweavers