We recently proposed a method for HMM adaptation to noisy environments called Linear Spline Interpolation (LSI). LSI uses linear spline regression to model the relationship between clean and noisy speech features. In the original algorithm, stereo training data was used to learn the spline parameters that minimize the error between the predicted and actual noisy speech features. The estimated splines are then used at runtime to adapt the clean HMMs to the current environment. While good results can be obtained with this approach, the performance is limited by the fact that the splines are trained independently from the speech recognizer and as such, they may actually be suboptimal for adaptation. In this work, we introduce a new Generalized EM algorithm for estimating the spline parameters using the speech recognizer itself. Experiments on the Aurora 2 task show that using LSI adaptation with splines trained in this manner results in a 20% improvement over the original LSI algorithm t...
Michael L. Seltzer, Alex Acero