Sciweavers

ACL
2009

Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling

13 years 9 months ago
Bayesian Unsupervised Word Segmentation with Nested Pitman-Yor Language Modeling
In this paper, we propose a new Bayesian model for fully unsupervised word segmentation and an efficient blocked Gibbs sampler combined with dynamic programming for inference. Our model is a nested hierarchical Pitman-Yor language model, where Pitman-Yor spelling model is embedded in the word model. We confirmed that it significantly outperforms previous reported results in both phonetic transcripts and standard datasets for Chinese and Japanese word segmentation. Our model is also considered as a way to construct an accurate word n-gram language model directly from characters of arbitrary language, without any "word" indications.
Daichi Mochihashi, Takeshi Yamada, Naonori Ueda
Added 16 Feb 2011
Updated 16 Feb 2011
Type Journal
Year 2009
Where ACL
Authors Daichi Mochihashi, Takeshi Yamada, Naonori Ueda
Comments (0)