Sciweavers

ACL
2015

Inducing Word and Part-of-Speech with Pitman-Yor Hidden Semi-Markov Models

8 years 7 months ago
Inducing Word and Part-of-Speech with Pitman-Yor Hidden Semi-Markov Models
We propose a nonparametric Bayesian model for joint unsupervised word segmentation and part-of-speech tagging from raw strings. Extending a previous model for word segmentation, our model is called a Pitman-Yor Hidden SemiMarkov Model (PYHSMM) and considered as a method to build a class n-gram language model directly from strings, while integrating character and word level information. Experimental results on standard datasets on Japanese, Chinese and Thai revealed it outperforms previous results to yield the state-of-the-art accuracies. This model will also serve to analyze a structure of a language whose words are not identified a priori.
Kei Uchiumi, Hiroshi Tsukahara, Daichi Mochihashi
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ACL
Authors Kei Uchiumi, Hiroshi Tsukahara, Daichi Mochihashi
Comments (0)