English-Spanish Large Statistical Dictionary of Inflectional Forms

15 years 8 months ago

Download www.lrec-conf.org

The paper presents an approach for constructing a weighted bilingual dictionary of inflectional forms using as input data a traditional bilingual dictionary, and not parallel corpora. An algorithm is developed that generates all possible morphological (inflectional) forms and weights them using information on distribution of corresponding grammar sets (grammar information) in large corpora for each language. The algorithm also takes into account the compatibility of grammar sets in a language pair; for example, verb in past tense in language L normally is expected to be translated by verb in past tense in Language L. We consider that the developed method is universal, i.e. can be applied to any pair of languages. The obtained dictionary is freely available. It can be used in several NLP tasks, for example, statistical machine translation.

Grigori Sidorov, Alberto Barrón-Cedeñ

Real-time Traffic

Bilingual Dictionary | Education | Grammar Sets | LREC 2010 | Weighted Bilingual Dictionary |

claim paper

» Acquiring a Poor Mans Inflectional Lexicon for German

» Statistical Morphological Disambiguation for Agglutinative Languages

» Arabic Morphological Tagging Diacritization and Lemmatization Using Lexeme Models and Feat...

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Grigori Sidorov, Alberto Barrón-Cedeño, Paolo Rosso

Comments (0)

Sciweavers

English-Spanish Large Statistical Dictionary of Inflectional Forms

Bilingual Dictionary | Education | Grammar Sets | LREC 2010 | Weighted Bilingual Dictionary |

Explore & Download

Productivity Tools

Sciweavers