Sciweavers

CORR
2002
Springer

Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

13 years 11 months ago
Unsupervised discovery of morphologically related words based on orthographic and semantic similarity
We present an algorithm that takes an unannotated corpus as its input, and returns a ranked list of probable morphologically related pairs as its output. The algorithm tries to discover morphologically related pairs by looking for pairs that are both orthographically and semantically similar, where orthographic similarity is measured in terms of minimum edit distance, and semantic similarity is measured in terms of mutual information. The procedure does not rely on a morpheme concatenation model, nor on distributional properties of word substrings (such as affix frequency). Experiments with German and English input give encouraging results, both in terms of precision (proportion of good pairs found at various cutoff points of the ranked list), and in terms of a qualitative analysis of the types of morphological patterns discovered by the algorithm.
Marco Baroni, Johannes Matiasek, Harald Trost
Added 18 Dec 2010
Updated 18 Dec 2010
Type Journal
Year 2002
Where CORR
Authors Marco Baroni, Johannes Matiasek, Harald Trost
Comments (0)