Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study

15 years 4 months ago

Download mtgroup.ict.ac.cn

Manually annotated corpora are valuable but scarce resources, yet for many annotation tasks such as treebanking and sequence labeling there exist multiple corpora with different and incompatible annotation guidelines or standards. This seems to be a great waste of human efforts, and it would be nice to automatically adapt one annotation standard to another. We present a simple yet effective strategy that transfers knowledge from a differently annotated corpus to the corpus with desired annotation. We test the efficacy of this method in the context of Chinese word segmentation and part-of-speech tagging, where no segmentation and POS tagging standards are widely accepted due to the lack of morphology in Chinese. Experiments show that adaptation from the much larger People's Daily corpus to the smaller but more popular Penn Chinese Treebank results in significant improvements in both segmentation and tagging accuracies (with error reductions of 30.2% and 14%, respectively), which i...

Wenbin Jiang, Liang Huang, Qun Liu

Real-time Traffic

ACL 2009 | Annotation | Chinese Word Segmentation | Computational Linguistics | Incompatible Annotation Guidelines |

claim paper

Added	16 Feb 2011
Updated	16 Feb 2011
Type	Journal
Year	2009
Where	ACL
Authors	Wenbin Jiang, Liang Huang, Qun Liu

Sciweavers

Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging - A Case Study

ACL 2009 | Annotation | Chinese Word Segmentation | Computational Linguistics | Incompatible Annotation Guidelines |

Explore & Download

Productivity Tools

Sciweavers