Sciweavers

ACL
2007

Chinese Segmentation with a Word-Based Perceptron Algorithm

14 years 1 months ago
Chinese Segmentation with a Word-Based Perceptron Algorithm
Standard approaches to Chinese word segmentation treat the problem as a tagging task, assigning labels to the characters in the sequence indicating whether the character marks a word boundary. Discriminatively trained models based on local character features are used to make the tagging decisions, with Viterbi decoding finding the highest scoring segmentation. In this paper we propose an alternative, word-based segmentor, which uses features based on complete words and word sequences. The generalized perceptron algorithm is used for discriminative training, and we use a beamsearch decoder. Closed tests on the first and second SIGHAN bakeoffs show that our system is competitive with the best in the literature, achieving the highest reported F-scores for a number of corpora.
Yue Zhang 0004, Stephen Clark
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where ACL
Authors Yue Zhang 0004, Stephen Clark
Comments (0)