Predicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches

15 years 9 months ago

Download www.lrec-conf.org

This paper presented an overview of Chinese bi-character words' morphological types, and proposed a set of features for machine learning approaches to predict these types based on composite characters' information. First, eight morphological types were defined, and 6,500 Chinese bi-character words were annotated with these types. After pre-processing, 6,178 words were selected to construct a corpus named Reduced Set. We analyzed Reduced Set and conducted the inter-annotator agreement test. The average kappa value of 0.67 indicates a substantial agreement. Second, Bi-character words' morphological types are considered strongly related with the composite characters' parts of speech in this paper, so we proposed a set of features which can simply be extracted from dictionaries to indicate the characters' "tendency" of parts of speech. Finally, we used these features and adopted three machine learning algorithms, SVM, CRF, and Na

Ting-Hao Huang, Lun-Wei Ku, Hsin-Hsi Chen

Real-time Traffic

Bi-character Words | Chinese Bi-character Words | Education | LREC 2010 | Morphological Types |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Ting-Hao Huang, Lun-Wei Ku, Hsin-Hsi Chen

Comments (0)

Sciweavers

Predicting Morphological Types of Chinese Bi-Character Words by Machine Learning Approaches

Bi-character Words | Chinese Bi-character Words | Education | LREC 2010 | Morphological Types |

Explore & Download

Productivity Tools

Sciweavers