An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

14 years 4 months ago

Download www.aclweb.org

In this paper, we present a discriminative word-character hybrid model for joint Chinese word segmentation and POS tagging. Our word-character hybrid model offers high performance since it can handle both known and unknown words. We describe our strategies that yield good balance for learning the characteristics of known and unknown words and propose an errordriven policy that delivers such balance by acquiring examples of unknown words from particular errors in a training corpus. We describe an efﬁcient framework for training our model based on the Margin Infused Relaxed Algorithm (MIRA), evaluate our approach on the Penn Chinese Treebank, and show that it achieves superior performance compared to the state-ofthe-art approaches reported in the literature.

Canasai Kruengkrai, Kiyotaka Uchimoto, Jun'ichi Ka

Real-time Traffic

ACL 2009 | Chinese Word Segmentation | Computational Linguistics | Penn Chinese Treebank | Word-character Hybrid Model |

claim paper

» Joint Word Segmentation and POS Tagging Using a Single Perceptron

Post Info
More Details (n/a)

Added	24 Feb 2011
Updated	24 Feb 2011
Type	Journal
Year	2009
Where	ACL
Authors	Canasai Kruengkrai, Kiyotaka Uchimoto, Jun'ichi Kazama, Yiou Wang, Kentaro Torisawa, Hitoshi Isahara

Comments (0)

Sciweavers

An Error-Driven Word-Character Hybrid Model for Joint Chinese Word Segmentation and POS Tagging

ACL 2009 | Chinese Word Segmentation | Computational Linguistics | Penn Chinese Treebank | Word-character Hybrid Model |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers