Sciweavers

ACL
2003

Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency

14 years 1 months ago
Unsupervised Segmentation of Words Using Prior Distributions of Morph Length and Frequency
We present a language-independent and unsupervised algorithm for the segmentation of words into morphs. The algorithm is based on a new generative probabilistic model, which makes use of relevant prior information on the length and frequency distributions of morphs in a language. Our algorithm is shown to outperform two competing algorithms, when evaluated on data from a language with agglutinative morphology (Finnish), and to perform well also on English data.
Mathias Creutz
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where ACL
Authors Mathias Creutz
Comments (0)