Simple Morpheme Labelling in Unsupervised Morpheme Analysis

15 years 8 months ago

Download www.clef-campaign.org

This paper presents my participation to the second Morpho Challenge. Results have been obtained with the algorithm already presented at Morpho Challenge 2005. The system takes a plain list of words as input and returns a list of labelled morphemic segments for each word. Morphemic segments are obtained by an unsupervised learning process which can directly be applied to diﬀerent natural languages. The system ﬁrst relies on segment predictability within the longest words in the input word list to identify a set of preﬁxes and suﬃxes. Stems are then acquired by stripping aﬃxes from the words. In a third step, words sharing a common stem are compared and split in similar and dissimilar parts corresponding to morphemic segments. Finally, the best segmentation is chosen for each word among all possible segmentations. Results obtained at competition 1 (evaluation of the morpheme analyses) are better in English, Finnish and German than in Turkish. For information retrieval (competi...

Delphine Bernhard

Real-time Traffic