A comparison of unsupervised methods for Part-of-Speech Tagging in Chinese

15 years 2 months ago

Download www.aclweb.org

We conduct a series of Part-of-Speech (POS) Tagging experiments using Expectation Maximization (EM), Variational Bayes (VB) and Gibbs Sampling (GS) against the Chinese Penn Treebank. We want to first establish a baseline for unsupervised POS tagging in Chinese, which will facilitate future research in this area. Secondly, by comparing and analyzing the results between Chinese and English, we highlight some of the strengths and weaknesses of each of the algorithms in POS tagging task and attempt to explain the differences based on some preliminary linguistics analysis. Comparing to English, we find that all algorithms perform rather poorly in Chinese in 1-to-1 accuracy result but are more competitive in many-to-1 accuracy. We attribute one possible explanation of this to the algorithms' inability to correctly produce tags that match the desired tag count distribution.

Alex Cheng, Fei Xia, Jianfeng Gao

Real-time Traffic

Algorithms | Chinese Penn Treebank | COLING 2010 | Computational Linguistics | Preliminary Linguistics Analysis |

claim paper

» A Comparison of Chinese Parsers for Stanford Dependencies

» Modeling RFID signal strength and tag detection for localization and mapping

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Alex Cheng, Fei Xia, Jianfeng Gao

Comments (0)

Sciweavers

A comparison of unsupervised methods for Part-of-Speech Tagging in Chinese

Algorithms | Chinese Penn Treebank | COLING 2010 | Computational Linguistics | Preliminary Linguistics Analysis |

Explore & Download

Productivity Tools

Sciweavers