Sciweavers

ICMCS
2005
IEEE

Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition

14 years 6 months ago
Rapid Feature Space Speaker Adaptation for Multi-Stream HMM-Based Audio-Visual Speech Recognition
Multi-stream hidden Markov models (HMMs) have recently been very successful in audio-visual speech recognition, where the audio and visual streams are fused at the final decision level. In this paper we investigate fast feature space speaker adaptation using multi-stream HMMs for audio-visual speech recognition. In particular, we focus on studying the performance of feature-space maximum likelihood linear regression (fMLLR), a fast and effective method for estimating feature space transforms. Unlike the common speaker adaptation techniques of MAP or MLLR, fMLLR does not change the audio or visual HMM parameters, but simply applies a single transform to the testing features. We also address the problem of fast and robust on-line fMLLR adaptation using feature space maximum a posterior linear regression (fMAPLR). Adaptation experiments are reported on the IBM infrared headset audio-visual database. On average for a 20-speaker¢ hour independent test set, the multi-stream fMLLR achieves...
Jing Huang, Etienne Marcheret, Karthik Visweswaria
Added 24 Jun 2010
Updated 24 Jun 2010
Type Conference
Year 2005
Where ICMCS
Authors Jing Huang, Etienne Marcheret, Karthik Visweswariah
Comments (0)