

Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica

14 years 4 months ago
Learning Morphology of Romance, Germanic and Slavic Languages with the Tool Linguistica
In this paper we present preliminary work conducted on semi-automatic induction of inflectional paradigms from non annotated corpora using the open-source tool Linguistica (Goldsmith 2001) that can be utilized without any prior knowledge of the language. The aim is to induce morphology information from corpora such as to compare languages and foresee the difficulty to develop morphosyntactic lexica. We report on a series of corpus-based experiments run with Linguistica in Romance languages (Catalan, French, Italian, Portuguese, and Spanish), Germanic languages (Dutch, English and German), and Slavic language Polish. For each language we obtained interesting clusters of stems sharing the same suffixes. They can be seen as mini inflectional paradigms that include productive derivative suffixes. We ranked results depending on the size of the paradigms (maximum number of suffixes per stem) per language. Results show that it is useful to get a first idea of the role and complexity of infle...
Helena Blancafort
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2010
Where LREC
Authors Helena Blancafort
Comments (0)