An investigation of subspace modeling for phonetic and speaker variability in automatic speech recognition

14 years 4 months ago

Download mirlab.org

This paper investigates the impact of subspace based techniques for acoustic modeling in automatic speech recognition (ASR). There are many well known approaches to subspace based speaker adaptation which represent sources of variability as a projection within a low dimensional subspace. A new approach to acoustic modeling in ASR, referred to as the subspace based Gaussian mixture model (SGMM), represents phonetic variability as a set of projections applied at the state level in a hidden Markov model (HMM) based acoustic model. The impact of the SGMM in modeling these intrinsic sources of variability is evaluated for a continuous speech recognition (CSR) task. The SGMM is shown to provide an 18% reduction in word error rate (WER) for speaker independent (SI) ASR relative to the continuous density HMM (CDHMM) in the resource management CSR domain. The SI performance obtained from SGMM also represents a 5% reduction in WER relative to subspace based speaker adaption in an unsupervised s...

Richard C. Rose, Shou-Chun Yin, Yun Tang

Real-time Traffic