Sciweavers

ACL
2009

Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study

13 years 10 months ago
Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study
Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with different and incompatible annotation guidelines or standards. This seems to be a great waste of human efforts, and it would be nice to automatically adapt one annotation standard to another. We present a simple yet effective strategy that transfers knowledge from a differently annotated corpus to the corpus with desired annotation. We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese. Experiments show that adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvements in both segmentation and tagging accuracies (with error reductions of 30.2% and 14%, respectively), which i...
Wenbin Jiang, Liang Huang, Qun Liu
Added 16 Feb 2011
Updated 16 Feb 2011
Type Journal
Year 2009
Where ACL
Authors Wenbin Jiang, Liang Huang, Qun Liu
Comments (0)