Unsupervised phonemic Chinese word segmentation using Adaptor Grammars

13 years 7 months ago

Download aclweb.org

Adaptor grammars are a framework for expressing and performing inference over a variety of non-parametric linguistic models. These models currently provide state-of-the-art performance on unsupervised word segmentation from phonemic representations of child-directed unsegmented English utterances. This paper investigates the applicability of these models to unsupervised word segmentation of Mandarin. We investigate a wide variety of different segmentation models, and show that the best segmentation accuracy is obtained from models that capture interword "collocational" dependencies. Surprisingly, enhancing the models to exploit syllable structure regularities and to capture tone information does improve overall word segmentation accuracy, perhaps because the information these elements convey is redundant when compared to the inter-word dependencies.

Mark Johnson, Katherine Demuth

Real-time Traffic

COLING 2010 | Computational Linguistics | Segmentation Accuracy | Unsupervised Word Segmentation | Word Segmentation |

claim paper

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Mark Johnson, Katherine Demuth

Comments (0)

Sciweavers

Unsupervised phonemic Chinese word segmentation using Adaptor Grammars

COLING 2010 | Computational Linguistics | Segmentation Accuracy | Unsupervised Word Segmentation | Word Segmentation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers