

Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora

14 years 12 days ago
Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora
Previous attempts at identifying translational equivalents in comparable corpora have dealt with very large `general language' corpora and words. We address this task in a specialized domain, medicine, starting from smaller non-parallel, comparable corpora and an initial bilingual medical lexicon. We compare the distributional contexts of source and target words, testing several weighting factors and similarity measures. On a test set of frequently occurring words, for the best combination (the Jaccard similarity measure with or without tf:idf weighting), the correct translation is ranked first for 20% of our test words, and is found in the top 10 candidates for 50% of them. An additional reverse-translation filtering step improves the precision of the top candidate translation up to 74%, with a 33% recall.
Yun-Chuang Chiao, Pierre Zweigenbaum
Added 17 Dec 2010
Updated 17 Dec 2010
Type Journal
Year 2002
Authors Yun-Chuang Chiao, Pierre Zweigenbaum
Comments (0)