Learning to Lemmatise Slovene Words

15 years 11 months ago

Download www-ai.ijs.si

Abstract. Automatic lemmatisation is a core application for many language processing tasks. In inﬂectionally rich languages, such as Slovene, assigning the correct lemma to each word in a running text is not trivial: nouns and adjectives, for instance, inﬂect for number and case, with a complex conﬁguration of endings and stem modiﬁcations. The problem is especially diﬃcult for unknown words, as word forms cannot be matched against a lexicon giving the correct lemma, its part-of-speech and paradigm class. The paper discusses a machine learning approach to the automatic lemmatisation of unknown words, in particular nouns and adjectives, in Slovene texts. We decompose the problem of learning to perform lemmatisation into two subproblems: the ﬁrst is to learn to perform morphosyntactic tagging, and the second is to learn to perform morphological analysis, which produces the lemma from the word form given the correct morphosyntactic tag. A statistics-based trigram tagger is use...

Saso Dzeroski, Tomaz Erjavec

Real-time Traffic

Automated Reasoning | Correct Lemma | LLL 1999 | Morphosyntactic Tagging | Unknown Words |

claim paper

» Intraclausal Coordination and Clause Detection as a Preprocessing Step to Dependency Parsi...

» Machine Learning of Morphosyntactic Structure Lemmatizing Unknown Slovene Words

» A global model for joint lemmatization and partofspeech prediction

» Morphosyntactic Tagging of Slovene Using Progol

Post Info
More Details (n/a)

Added	04 Aug 2010
Updated	04 Aug 2010
Type	Conference
Year	1999
Where	LLL
Authors	Saso Dzeroski, Tomaz Erjavec

Comments (0)

Sciweavers

Learning to Lemmatise Slovene Words

Automated Reasoning | Correct Lemma | LLL 1999 | Morphosyntactic Tagging | Unknown Words |

Explore & Download

Productivity Tools

Sciweavers