This paper deals with the problem of under-determined convolutive blind source separation. We model the contribution of each source to all mixture channels in the time-frequency domain as a zero-mean Gaussian random variable whose covariance encodes the spatial properties of the source. We consider two covariance models and address the estimation of their parameters from the recorded mixture by a suitable initialization scheme followed by an iterative expectationmaximization (EM) procedure in each frequency bin. We then align the order of the estimated sources across all frequency bins based on their estimated directions of arrival (DOA). Experimental results over a stereo reverberant speech mixture show the effectiveness of the proposed approach.
Ngoc Q. K. Duong, Emmanuel Vincent, Rémi Gr