Robust automatic speech recognition with decoder oriented ideal binary mask estimation

14 years 9 months ago

Download www.isle.illinois.edu

In this paper, we propose a joint optimal method for automatic speech recognition (ASR) and ideal binary mask (IBM) estimation in transformed into the cepstral domain through a newly derived generalized expectation maximization algorithm. First, cepstral domain missing feature marginalization is established using a linear transformation, after tying the mean and variance of non-existing cepstral coefficients. Second, IBM estimation is formulated using a generalized expectation maximization algorithm directly to optimize the ASR performance. Experimental results show that even in highly non-stationary mismatch condition (dance music as background noise), the proposed method achieves much higher absolute ASR accuracy improvement ranging from 14.69% at 0 dB SNR to 40.10% at 15 dB SNR compared with the conventional noise suppression method.

Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson

Real-time Traffic

Cepstral Domain | Expectation Maximization Algorithm | Generalized Expectation Maximization | INTERSPEECH 2010 | Signal Processing |

claim paper

» A computational auditory scene analysis system for speech segregation and robust speech re...

» A Classificationbased Cocktailparty Processor

Post Info
More Details (n/a)

Added	18 May 2011
Updated	18 May 2011
Type	Journal
Year	2010
Where	INTERSPEECH
Authors	Lae-Hoon Kim, Kyung-Tae Kim, Mark Hasegawa-Johnson

Comments (0)

Sciweavers

Robust automatic speech recognition with decoder oriented ideal binary mask estimation

Cepstral Domain | Expectation Maximization Algorithm | Generalized Expectation Maximization | INTERSPEECH 2010 | Signal Processing |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers