Originally, ideal binary mask (idbm) techniques have been used as a tool for studying aspects of the auditory system. More recently, idbm techniques have been adapted to the practical problem of retrieving a target speech signal from a noisy observation. In this practical setting, the binary mask techniques show similarities with existing DFT based speech enhancement techniques. In this context, we derive single-channel, binary mask estimators which minimize the spectral magnitude mean-square error. We show in simulation experiments with natural speech and noise signals that the proposed estimators perform signi cantly better than existing binary mask estimators. However, even the best of the proposed estimators is clearly outperformed by non-binary estimators, both in terms of speech quality and intelligibility.
Jesper Jensen, Richard C. Hendriks