The most popular speech feature extractor used in automatic speech recognition (ASR) systems today is the mel frequency cepstral coefficient (mfcc) algorithm. Introduced in 1980, the filter bank-based algorithm eventually replaced linear prediction cepstral coefficients (lpcc) as the premier front end, primarily because of mfcc’s superior robustness to additive noise. However, mfcc does not approximate the critical bandwidth of the human auditory system. We propose a novel scheme for decoupling filter bandwidth from other filter bank parameters, and we demonstrate improved noise robustness over three versions of mfcc through HMMbased experiments with the English digits in various noise environments.
Mark D. Skowronski, John G. Harris