The performance of a typical speaker verification system degrades significantly in reverberant environments. This degradation is partly due to the conventional feature extraction/compensation techniques that use analysis windows which are much shorter than typical room impulse responses. In this paper, we present a feature extraction technique which estimates long-term envelopes of speech in narrow sub-bands using frequency domain linear prediction (FDLP). When speech is corrupted by reverberation, the long-term sub-band envelopes are convolved in time with those of the room impulse response function. In a first order approximation, gain normalization of these envelopes in the FDLP model suppresses the room reverberation artifacts. Experiments are performed on the 8 core conditions of the NIST 2008 speaker recognition evaluation (SRE). In these experiments, the FDLP features provide significant improvements on the interview microphone conditions (relative improvements of 2030%) ov...
Sriram Ganapathy, Jason W. Pelecanos, Mohamed Kama