Current state-of-the-art speech recognition systems work quite well in controlled environments but their performance degrades severely in realistic acoustical conditions in reverberant environments. In this paper we build on the recent developments that represent reverberation in the cepstral feature domain as a filtering operation and we formulate a maximum likelihood objective to obtain an inverse reverberation filter. We show analytically that the optimal inverse filter can be approximately obtained under certain assumptions about the corresponding clean speech signal. We demonstrate that our approach reduces the relative gap in word error rate by 30 percent in large as well as small reverberation times.
Kshitiz Kumar, Richard M. Stern