

Building a Bilingual Representation of the Roget Thesaurus for French to English Machine Translation

14 years 5 months ago
Building a Bilingual Representation of the Roget Thesaurus for French to English Machine Translation
This paper describes a solution to lexical transfer as a trade-off between a dictionary and an ontology. It shows its association to a translation tool based on morpho-syntactical parsing of the source language. It is based on the English Roget Thesaurus and its equivalent, the French Larousse Thesaurus, in a computational framework. Both thesaurii are transformed into vector spaces, and all monolingual entries are represented as vectors, with 1000 components for English and 873 for French. The indexing concepts of the respective thesaurii are the generation families of the vector spaces. A bilingual data structure transforms French entries into vectors in the English space, by using their equivalencies representations. Word sense disambiguation consists in choosing the appropriate vector among these 'bilingual' vectors, by computing the contextualized vector of a given word in its source sentence, wading it in the English vector space, and computing the closest distance to ...
Violaine Prince, Jacques Chauché
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2008
Where LREC
Authors Violaine Prince, Jacques Chauché
Comments (0)