Unsupervized Word Segmentation: the Case for Mandarin Chinese

12 years 4 months ago

Download aclweb.org

In this paper, we present an unsupervized segmentation system tested on Mandarin Chinese. Following Harris's Hypothesis in Kempe (1999) and Tanaka-Ishii's (2005) reformulation, we base our work on the Variation of Branching Entropy. We improve on (Jin and Tanaka-Ishii, 2006) by adding normalization and viterbidecoding. This enable us to remove most of the thresholds and parameters from their model and to reach near state-of-the-art results (Wang et al., 2011) with a simpler system. We provide evaluation on diﬀerent corpora available from the Segmentation bake-oﬀ II (Emerson, 2005) and deﬁne a more precise topline for the task using cross-trained supervized system available oﬀ-the-shelf (Zhang and Clark, 2010; Zhao and Kit, 2008; Huang and Zhao, 2007)

Pierre Magistry, Benoît Sagot

Real-time Traffic

ACL 2012 | Computational Linguistics | Ishii | Mandarin Chinese | Segmentation System |

claim paper

Post Info
More Details (n/a)

Added	29 Sep 2012
Updated	29 Sep 2012
Type	Journal
Year	2012
Where	ACL
Authors	Pierre Magistry, Benoît Sagot

Comments (0)

Sciweavers

Unsupervized Word Segmentation: the Case for Mandarin Chinese

ACL 2012 | Computational Linguistics | Ishii | Mandarin Chinese | Segmentation System |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers