Bilingual lexicon extraction from comparable corpora using in-domain terms

13 years 7 months ago

Download www.aclweb.org

Many existing methods for bilingual lexicon learning from comparable corpora are based on similarity of context vectors. These methods suffer from noisy vectors that greatly affect their accuracy. We introduce a method for filtering this noise allowing highly accurate learning of bilingual lexicons. Our method is based on the notion of in-domain terms which can be thought of as the most important contextually relevant words. We provide a method for identifying such terms. Our evaluation shows that the proposed method can learn highly accurate bilingual lexicons without using orthographic features or a large initial seed dictionary. In addition, we also introduce a method for measuring the similarity between two words in different languages without requiring any initial dictionary.

Azniah Ismail, Suresh Manandhar

Real-time Traffic

Accurate Bilingual Lexicons | Bilingual Lexicons | COLING 2010 | Computational Linguistics | Many Existing Methods |

claim paper

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Azniah Ismail, Suresh Manandhar

Comments (0)

Sciweavers

Bilingual lexicon extraction from comparable corpora using in-domain terms

Accurate Bilingual Lexicons | Bilingual Lexicons | COLING 2010 | Computational Linguistics | Many Existing Methods |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers