A segment-based hidden markov model for real-setting pinyin-to-chinese conversion

15 years 11 months ago

Download www.daviszhou.net

Hidden markov model (HMM) is frequently used for Pinyin-toChinese conversion. But it only captures the dependency with the preceding character. Higher order markov models can bring higher accuracy, but are computationally unaffordable to average PC settings. We propose a segment-based hidden markov model (SHMM), which has the same magnitude of complexity as firstorder HMM, but generates higher decoding accuracy. SHMM tells a word from a bigram connecting two words, and assigns a reasonable probability to words as a whole. It is more powerful than HMM to decode words containing over two characters. We conduct a comprehensive Pinyin-to-Chinese conversion evaluation on Lancaster corpus. The experiment shows the perfect sentence accuracy is improved from 34.7% (HMM) to 43.3% (SHMM). The one-error sentence accuracy is increased from 72.7% to 78.3%. Furthermore, SHMM can seamlessly integrate with pinyin typing correction, acronym pinyin input, user-defined words, and selfadaptive learning a...

Xiaohua Zhou, Xiaohua Hu, Xiaodan Zhang, Xiajiong

Real-time Traffic

CIKM 2007 | Hidden Markov Model | Information Management | Segment-based Hidden Markov | Sentence Accuracy |

claim paper

Post Info
More Details (n/a)

Added	13 Aug 2010
Updated	13 Aug 2010
Type	Conference
Year	2007
Where	CIKM
Authors	Xiaohua Zhou, Xiaohua Hu, Xiaodan Zhang, Xiajiong Shen

Comments (0)

Sciweavers

A segment-based hidden markov model for real-setting pinyin-to-chinese conversion

CIKM 2007 | Hidden Markov Model | Information Management | Segment-based Hidden Markov | Sentence Accuracy |

Explore & Download

Productivity Tools

Sciweavers