Abstract. In this paper, we present an unsupervised algorithm for morpheme discovery called UNGRADE (UNsupervised GRAph DEcomposition). UNGRADE works in three steps and can be applied to languages whose words have the structure prefixes-stem-suffixes. In the first step, a stem is obtained for each word using a sliding window, such that the description length of the window is minimised. In the next step prefix and suffix sequences are sought using a morpheme graph. The last step consists in combining morphemes found in the previous steps. UNGRADE has been experimentally evaluated on 5 languages (English, German, Finnish, Turkish and Arabic) with encouraging results.
Bruno Golénia, Sebastian Spiegler, Peter A.