Sciweavers

IJCNLP
2004
Springer

The Use of SVM for Chinese New Word Identification

14 years 5 months ago
The Use of SVM for Chinese New Word Identification
We present a study of new word identification (NWI) to improve the performance of a Chinese word segmenter. In this paper the distribution and types of new words are discussed empirically. In particular, we focus on the new words of two surface patterns, which account for more than 80% of new words in our data sets: NW11 (two-character new word) and NW21 (a bi-character word followed with a single character). NWI is defined as a problem of binary classification. A statistical learning approach based on a SVM classifier is used. Different features for NWI are explored, including in-word probability of a character (IWP), the analogy between new words and lexicon words, anti-word list, and frequency in documents. The experiments show that these features are useful for NWI. The Fscores of NWI we achieved are 64.4% and 54.7% for NW11 and NW21, respectively. The overall performance of the Chinese word segmenter could be improved by Roov 24.5% and F-score 6.5% in PK-close test of the 1st SIG...
Hongqiao Li, Changning Huang, Jianfeng Gao, Xiaozh
Added 02 Jul 2010
Updated 02 Jul 2010
Type Conference
Year 2004
Where IJCNLP
Authors Hongqiao Li, Changning Huang, Jianfeng Gao, Xiaozhong Fan
Comments (0)