Sciweavers

COLING
2010

Unsupervised phonemic Chinese word segmentation using Adaptor Grammars

13 years 5 months ago
Unsupervised phonemic Chinese word segmentation using Adaptor Grammars
Adaptor grammars are a framework for expressing and performing inference over a variety of non-parametric linguistic models. These models currently provide state-of-the-art performance on unsupervised word segmentation from phonemic representations of child-directed unsegmented English utterances. This paper investigates the applicability of these models to unsupervised word segmentation of Mandarin. We investigate a wide variety of different segmentation models, and show that the best segmentation accuracy is obtained from models that capture interword "collocational" dependencies. Surprisingly, enhancing the models to exploit syllable structure regularities and to capture tone information does improve overall word segmentation accuracy, perhaps because the information these elements convey is redundant when compared to the inter-word dependencies.
Mark Johnson, Katherine Demuth
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Mark Johnson, Katherine Demuth
Comments (0)