Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora

15 years 6 months ago

Download www.limsi.fr

Previous attempts at identifying translational equivalents in comparable corpora have dealt with very large `general language' corpora and words. We address this task in a specialized domain, medicine, starting from smaller non-parallel, comparable corpora and an initial bilingual medical lexicon. We compare the distributional contexts of source and target words, testing several weighting factors and similarity measures. On a test set of frequently occurring words, for the best combination (the Jaccard similarity measure with or without tf:idf weighting), the correct translation is ranked first for 20% of our test words, and is found in the top 10 candidates for 50% of them. An additional reverse-translation filtering step improves the precision of the top candidate translation up to 74%, with a 33% recall.

Yun-Chuang Chiao, Pierre Zweigenbaum

Real-time Traffic

Bilingual Medical Lexicon | COLING 2002 | COLING 2008 | Comparable Corpora | Similarity Measure |

claim paper

Added	17 Dec 2010
Updated	17 Dec 2010
Type	Journal
Year	2002
Where	COLING
Authors	Yun-Chuang Chiao, Pierre Zweigenbaum

Sciweavers

Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora

Bilingual Medical Lexicon | COLING 2002 | COLING 2008 | Comparable Corpora | Similarity Measure |

Explore & Download

Productivity Tools

Sciweavers