Sciweavers

ACL
2006

Unsupervised Segmentation of Chinese Text by Use of Branching Entropy

13 years 10 months ago
Unsupervised Segmentation of Chinese Text by Use of Branching Entropy
We propose an unsupervised segmentation method based on an assumption about language data: that the increasing point of entropy of successive characters is the location of a word boundary. A large-scale experiment was conducted by using 200 MB of unsegmented training data and 1 MB of test data, and precision of 90% wasattained with recall being around 80%. Moreover, we found that the precision was stable at around 90% independently of the learning data size.
Zhihui Jin, Kumiko Tanaka-Ishii
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2006
Where ACL
Authors Zhihui Jin, Kumiko Tanaka-Ishii
Comments (0)