Sciweavers

CIKM
2007
Springer

A segment-based hidden markov model for real-setting pinyin-to-chinese conversion

14 years 2 months ago
A segment-based hidden markov model for real-setting pinyin-to-chinese conversion
Hidden markov model (HMM) is frequently used for Pinyin-toChinese conversion. But it only captures the dependency with the preceding character. Higher order markov models can bring higher accuracy, but are computationally unaffordable to average PC settings. We propose a segment-based hidden markov model (SHMM), which has the same magnitude of complexity as firstorder HMM, but generates higher decoding accuracy. SHMM tells a word from a bigram connecting two words, and assigns a reasonable probability to words as a whole. It is more powerful than HMM to decode words containing over two characters. We conduct a comprehensive Pinyin-to-Chinese conversion evaluation on Lancaster corpus. The experiment shows the perfect sentence accuracy is improved from 34.7% (HMM) to 43.3% (SHMM). The one-error sentence accuracy is increased from 72.7% to 78.3%. Furthermore, SHMM can seamlessly integrate with pinyin typing correction, acronym pinyin input, user-defined words, and selfadaptive learning a...
Xiaohua Zhou, Xiaohua Hu, Xiaodan Zhang, Xiajiong
Added 13 Aug 2010
Updated 13 Aug 2010
Type Conference
Year 2007
Where CIKM
Authors Xiaohua Zhou, Xiaohua Hu, Xiaodan Zhang, Xiajiong Shen
Comments (0)