Finding translations for low-frequency words in comparable corpora

15 years 6 months ago

Download clg.wlv.ac.uk

Abstract. The paper proposes a method to improve the extraction of lowfrequency translation equivalents from comparable corpora. Prior to performing the mapping between vector spaces of different languages, the method models context vectors of rare words using their distributional similarity to words of the same language to predict unseen co-occurrences as well as to smooth rare, unreliable ones. Our evaluation shows that the proposed method delivers a consistent and significant improvement on the conventional approach to this task.

Viktor Pekar, Ruslan Mitkov, Dimitar Blagoev, Andr

Real-time Traffic

Comparable Corpora | Lowfrequency Translation Equivalents | MT 2006 | Vector Spaces |

claim paper

» Focused web crawling in the acquisition of comparable corpora

» Identifying Word Translations from Comparable Corpora Using Latent Topic Models

» Using Comparable Corpora to Adapt a Translation Model to Domains

» Using Comparable Corpora to Solve Problems Difficult for Human Translators

» Looking for Candidate Translational Equivalents in Specialized Comparable Corpora

» Word Sense Acquisition from Bilingual Comparable Corpora

» Rare Word Translation Extraction from Aligned Comparable Documents

» A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts

Post Info
More Details (n/a)

Added	14 Dec 2010
Updated	14 Dec 2010
Type	Journal
Year	2006
Where	MT
Authors	Viktor Pekar, Ruslan Mitkov, Dimitar Blagoev, Andrea Mulloni

Comments (0)

Sciweavers

Finding translations for low-frequency words in comparable corpora

Comparable Corpora | Lowfrequency Translation Equivalents | MT 2006 | Vector Spaces |

Explore & Download

Productivity Tools

Sciweavers