Sciweavers

ACL
2015

The Discovery of Natural Typing Annotations: User-produced Potential Chinese Word Delimiters

8 years 7 months ago
The Discovery of Natural Typing Annotations: User-produced Potential Chinese Word Delimiters
Human labeled corpus is indispensable for the training of supervised word segmenters. However, it is time-consuming and laborintensive to label corpus manually. During the process of typing Chinese text by Pingyin, people usually need to type "space" or numeric keys to choose the words due to homophones, which can be viewed as a cue for segmentation. We argue that such a process can be used to build a labeled corpus in a more natural way. Thus, in this paper, we investigate Natural Typing Annotations (NTAs) that are potential word delimiters produced by users while typing Chinese. A detailed analysis on over three hundred user-produced texts containing NTAs reveals that highquality NTAs mostly agree with gold segmentation and, consequently, can be used for improving the performance of supervised word segmentation model in out-of-domain. Experiments show that a classification model combined with a voting mechanism can reliably identify the high-quality NTAs texts that are mor...
Dakui Zhang, Yu Mao, Yang Liu, Hanshi Wang, Chuyua
Added 13 Apr 2016
Updated 13 Apr 2016
Type Journal
Year 2015
Where ACL
Authors Dakui Zhang, Yu Mao, Yang Liu, Hanshi Wang, Chuyuan Wei, Shiping Tang
Comments (0)