A system for the automatic segmentation of German words into morphs was developed. The main linguistic knowledge sources used by the system are a word syntax and a morph dictionary. The syntax is written in the formalism of right linear regular grammars and comprises approximately 1,400 rules describing the set of those sequences of morph classes which underlie syntactically well formed words. The morph dictionary contains almost 11,000 morphs. Each morph is assigned to up to 6 morph classes. - Statistical evaluations with 6000 test words showed that more than 99% of the segmented words got a correct segmentation.
T. Pachunke, O. Mertineit, Klaus Wothke, Rudolf Sc