Almost all current automatic speech recognition (ASR) systems conventionally append delta and double-delta cepstral features to static cepstral features. In this work we describe a modified featureextraction procedure in which the time-difference operation is performed in the spectral domain, rather than the cepstral domain as is generally presently done. We argue that this approach based on “delta-spectral” features is needed because even though deltacepstral features capture dynamic speech information and generally greatly improve ASR recognition accuracy, they are not robust to noise and reverberation. We support the validity of the delta-spectral approach both with observations about the modulation spectrum of speech and noise, and with objective experiments that document the benefit that the delta-spectral approach brings to a variety of currently popular feature extraction algorithms. We found that the use of delta-spectral features, rather than the more traditional deltac...
Kshitiz Kumar, Chanwoo Kim, Richard M. Stern