Sciweavers

ACL
2003

Improved Source-Channel Models for Chinese Word Segmentation

14 years 2 months ago
Improved Source-Channel Models for Chinese Word Segmentation
This paper presents a Chinese word segmentation system that uses improved sourcechannel models of Chinese sentence generation. Chinese words are defined as one of the following four types: lexicon words, morphologically derived words, factoids, and named entities. Our system provides a unified approach to the four fundamental features of word-level Chinese language processing: (1) word segmentation, (2) morphological analysis, (3) factoid detection, and (4) named entity recognition. The performance of the system is evaluated on a manually annotated test set, and is also compared with several state-ofthe-art systems, taking into account the fact that the definition of Chinese words often varies from system to system.
Jianfeng Gao, Mu Li, Changning Huang
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2003
Where ACL
Authors Jianfeng Gao, Mu Li, Changning Huang
Comments (0)