We propose to include synchrony effects, known to exist in the auditory system, to represent voiced parts of the speech signal in a robust way. The system decomposes the input signal by means of a bandpass filter bank, and utilizes a bank of phase locked loops (PLLs) to obtain information on the frequencies present at a specific time. This information about the frequency distribution is transformed into a spectral-like representation based on synchrony effects. Noisy speech recognition experiments are performed using this synchronybased spectrum, which is transformed into a small set of coefficients by using a transformation similar to that utilized for mel cepstrum features. We show that recognition performance compared to mel cepstrum features is advantageous, when measured over a range of SNR conditions, especially in the high noise level case.
Patricia A. Pelle, Claudio Estienne, Horacio Franc