Computer-Assisted Pronunciation Training System (CAPT) has become an important learning aid in second language (L2) learning. Our approach to CAPT is based on the use of phonological rules to capture language transfer effects that may cause mispronunciations. This paper presents an approach for automatic derivation of phonological rules from L2 speech. The rules are used to generate an extended recognition network (ERN) that captures the canonical pronunciations of words, as well as the possible mispronunciations. The ERN is used with automatic speech recognition for mispronunciation detection. Experimentation with an L2 speech corpus that contains recordings from 100 speakers aims to compare the automatically derived rules with manually authored rules. Comparable performance is achieved in mispronunciation detection (i.e. telling which phone is wrong). The automatically derived rules also offer improved performance in diagnostic accuracy (i.e. identify how the phone is wrong).
Wai Kit Lo, Shuang Zhang, Helen M. Meng