Monaural speech separation based on MAXVQ and CASA for robust speech recognition

14 years 16 days ago

Download nlpr-web.ia.ac.cn

Robustness is one of the most important topics for automatic speech recognition (ASR) in practical applications. Monaural speech separation based on computational auditory scene analysis (CASA) offers a solution to this problem. In this paper, a novel system is presented to separate the monaural speech of two talkers. Gaussian mixture models (GMMs) and vector quantizers (VQs) are used to learn the grouping cues on isolated clean data for each speaker. Given an utterance, speaker identification is firstly performed to identify the two speakers presented in the utterance, then the factorial-max vector quantization model (MAXVQ) is used to infer the mask signals and finally the utterance of the target speaker is resynthesized in the CASA framework. Recognition results on the 2006 speech separation challenge corpus prove that this proposed system can improve the robustness of ASR significantly.

Peng Li, Yong Guan, Shijin Wang, Bo Xu, Wenju Liu

Real-time Traffic

Automated Reasoning | Computational Auditory Scene Analysis | CSL 2010 | Monaural Speech | Utterance |

claim paper

Post Info
More Details (n/a)

Added	09 Dec 2010
Updated	09 Dec 2010
Type	Journal
Year	2010
Where	CSL
Authors	Peng Li, Yong Guan, Shijin Wang, Bo Xu, Wenju Liu

Comments (0)

Sciweavers

Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Automated Reasoning | Computational Auditory Scene Analysis | CSL 2010 | Monaural Speech | Utterance |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers