Sciweavers

COLING
2010

Word-based and Character-based Word Segmentation Models: Comparison and Combination

13 years 7 months ago
Word-based and Character-based Word Segmentation Models: Comparison and Combination
We present a theoretical and empirical comparative analysis of the two dominant categories of approaches in Chinese word segmentation: word-based models and character-based models. We show that, in spite of similar performance overall, the two models produce different distribution of segmentation errors, in a way that can be explained by theoretical properties of the two models. The analysis is further exploited to improve segmentation accuracy by integrating a word-based segmenter and a character-based segmenter. A Bootstrap Aggregating model is proposed. By letting multiple segmenters vote, our model improves segmentation consistently on the four different data sets from the second SIGHAN bakeoff.
Weiwei Sun
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Weiwei Sun
Comments (0)