Sciweavers

ACL
2015

Synthetic Word Parsing Improves Chinese Word Segmentation

8 years 6 months ago
Synthetic Word Parsing Improves Chinese Word Segmentation
We present a novel solution to improve the performance of Chinese word segmentation (CWS) using a synthetic word parser. The parser analyses the internal structure of words, and attempts to convert out-of-vocabulary words (OOVs) into in-vocabulary fine-grained sub-words. We propose a pipeline CWS system that first predicts this fine-grained segmentation, then chunks the output to reconstruct the original word segmentation standard. We achieve competitive results on the PKU and MSR datasets, with substantial improvements in OOV recall.
Fei Cheng, Kevin Duh, Yuji Matsumoto
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ACL
Authors Fei Cheng, Kevin Duh, Yuji Matsumoto
Comments (0)