High-Performance, Language-Independent Morphological Segmentation

15 years 8 months ago

Download acl.ldc.upenn.edu

This paper introduces an unsupervised morphological segmentation algorithm that shows robust performance for four languages with different levels of morphological complexity. In particular, our algorithm outperforms Goldsmith’s Linguistica and Creutz and Lagus’s Morphessor for English and Bengali, and achieves performance that is comparable to the best results for all three PASCAL evaluation datasets. Improvements arise from (1) the use of relative corpus frequency and suffix level similarity for detecting incorrect morpheme attachments and (2) the induction of orthographic rules and allomorphs for segmenting words where roots exhibit spelling changes during morpheme attachments.

Sajib Dasgupta, Vincent Ng

Real-time Traffic

Computational Linguistics | Incorrect Morpheme Attachments | Morpheme Attachments | Morphological Segmentation Algorithm | NAACL 2007 |

claim paper

» A LanguageIndependent Unsupervised Model for Morphological Segmentation

» Combining Morphemebased Machine Translation with Postprocessing Morpheme Prediction

» STeP1 A Set of Fundamental Tools for Persian Text Processing

» Lexicalized Phonotactic Word Segmentation

Post Info
More Details (n/a)

Added	30 Oct 2010
Updated	30 Oct 2010
Type	Conference
Year	2007
Where	NAACL
Authors	Sajib Dasgupta, Vincent Ng

Comments (0)

Sciweavers

High-Performance, Language-Independent Morphological Segmentation

Computational Linguistics | Incorrect Morpheme Attachments | Morpheme Attachments | Morphological Segmentation Algorithm | NAACL 2007 |

Explore & Download

Productivity Tools

Sciweavers