Sciweavers

EMNLP
2004

Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?

14 years 2 months ago
Chinese Part-of-Speech Tagging: One-at-a-Time or All-at-Once? Word-Based or Character-Based?
Chinese part-of-speech (POS) tagging assigns one POS tag to each word in a Chinese sentence. However, since words are not demarcated in a Chinese sentence, Chinese POS tagging requires word segmentation as a prerequisite. We could perform Chinese POS tagging strictly after word segmentation (one-at-a-time approach), or perform both word segmentation and POS tagging in a combined, single step simultaneously (all-atonce approach). Also, we could choose to assign POS tags on a word-by-word basis, making use of word features in the surrounding context (word-based), or on a character-by-character basis with character features (character-based). This paper presents an in-depth study on such issues of processing architecture and feature representation for Chinese POS tagging, within a maximum entropy framework. We found that while the all-at-once, characterbased approach is the best, the one-at-a-time, character-based approach is a worthwhile compromise, performing only slightly worse in ter...
Hwee Tou Ng, Jin Kiat Low
Added 30 Oct 2010
Updated 30 Oct 2010
Type Conference
Year 2004
Where EMNLP
Authors Hwee Tou Ng, Jin Kiat Low
Comments (0)