In this paper we describe a system that separates signals by comparing the interaural time delays (ITDs) of their timefrequency components to a fixed threshold ITD. While in previous algorithms the fixed threshold ITD had been obtained empirically from training data in a specific environment, in real environments the characteristics that affect the optimal value of this threshold are unknown and possibly time varying. If these configurations are different from the environment under which the ITD threshold had been pre-computed, the performance of the source separation system is degraded. In this paper, we present an algorithm which chooses a threshold ITD that minimizes the cross-correlation of the target and interfering signals, after a compressive nonlinearity. We demonstrate that the algorithm described in this paper provides speech recognition accuracy that is much more robust to changes in environment than would be obtained using a fixed threshold ITD.
Chanwoo Kim, Richard M. Stern, Kiwan Eom, Jaewon L