We propose a new channel compensation method for modulation spectral features. We compare our proposed method, subband normalization, with a more traditional method, cepstral mean subtraction (CMS). Experimental results show that subband normalized modulation scale features provide advantages over CMS features. The proposed method is not only robust to slowly varying convolutional noise, but also to time-scale modification and time misalignment. CMS is not robust to these time distortions. We discuss the theory of estimating a modulation scale representation and its channel compensation. Audio identification is used for experimental verification. Simulation results on a large database show that the proposed method provides a high accuracy in spite of convolutional noise and time distortions.
Somsak Sukittanon, Les E. Atlas