Word-based and Character-based Word Segmentation Models: Comparison and Combination

15 years 2 months ago

Download aclweb.org

We present a theoretical and empirical comparative analysis of the two dominant categories of approaches in Chinese word segmentation: word-based models and character-based models. We show that, in spite of similar performance overall, the two models produce different distribution of segmentation errors, in a way that can be explained by theoretical properties of the two models. The analysis is further exploited to improve segmentation accuracy by integrating a word-based segmenter and a character-based segmenter. A Bootstrap Aggregating model is proposed. By letting multiple segmenters vote, our model improves segmentation consistently on the four different data sets from the second SIGHAN bakeoff.

Weiwei Sun

Real-time Traffic

Bootstrap Aggregating Model | Chinese Word Segmentation | COLING 2010 | Computational Linguistics | Empirical Comparative Analysis |

claim paper

» Incremental Joint Approach to Word Segmentation POS Tagging and Dependency Parsing in Chin...

» Lowcost HighPerformance Translation Retrieval Dumber is Better

» Assessing Student Paraphrases Using Lexical Semantics and Word Weighting

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Weiwei Sun

Comments (0)

Sciweavers

Word-based and Character-based Word Segmentation Models: Comparison and Combination

Bootstrap Aggregating Model | Chinese Word Segmentation | COLING 2010 | Computational Linguistics | Empirical Comparative Analysis |

Explore & Download

Productivity Tools

Sciweavers