Sciweavers

CLEF
2009
Springer

Morphological Analysis by Multiple Sequence Alignment

14 years 1 months ago
Morphological Analysis by Multiple Sequence Alignment
In biological sequence processing, Multiple Sequence Alignment (MSA) techniques capture information about long-distance dependencies and the three-dimensional structure of protein and nucleotide sequences without resorting to polynomial complexity context-free models. But MSA techniques have rarely been used in natural language (NL) processing, and never for NL morphology induction. Our MetaMorph algorithm is a first attempt at leveraging MSA techniques to induce NL morphology in an unsupervised fashion. Given a text corpus in any language, MetaMorph sequentially aligns words of the corpus to form an MSA and then segments the MSA to produce morphological analyses. Over corpora that contain millions of unique word types, MetaMorph identifies morphemes at an F1 below state-of-the-art performance. But when restricted to smaller sets of orthographically related words, MetaMorph outperforms the state-of-the-art ParaMor-Morfessor Union morphology induction system. Tested on 5,000 orthographi...
Tzvetan Tchoukalov, Christian Monson, Brian Roark
Added 08 Nov 2010
Updated 08 Nov 2010
Type Conference
Year 2009
Where CLEF
Authors Tzvetan Tchoukalov, Christian Monson, Brian Roark
Comments (0)