Self-Supervised Chinese Word Segmentation

14 years 8 months ago

Download ai.uwaterloo.ca

Abstract. We propose a new unsupervised training method for acquiring probability models that accurately segment Chinese character sequences into words. By constructing a core lexicon to guide unsupervised word learning, self-supervised segmentation overcomes the local maxima problems that hamper standard EM training. Our procedure uses successive EM phases to learn a good probability model over character strings, and then prunes this model with a mutual information selection criterion to obtain a more accurate word lexicon. The segmentations produced by these models are more accurate than those produced by training with EM alone.

Fuchun Peng, Dale Schuurmans

Real-time Traffic

IDA 2001 | Information Management | Probability Model | Standard Em Training | Unsupervised Training Method |

claim paper

Post Info
More Details (n/a)

Added	30 Jul 2010
Updated	30 Jul 2010
Type	Conference
Year	2001
Where	IDA
Authors	Fuchun Peng, Dale Schuurmans

Comments (0)

Sciweavers

Self-Supervised Chinese Word Segmentation

IDA 2001 | Information Management | Probability Model | Standard Em Training | Unsupervised Training Method |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers