We consider the problem of convolutive blind source separation of stereo mixtures. This is often tackled using frequency-domain independent component analysis (FDICA), or time-frequency masking methods such as DUET. In these methods, the short-term Fourier transform (STFT) is used to transform the signal into the timefrequency domain. Instead of using a fixed time-frequency transform on each mixture channel, such as the STFT, we propose learning an adaptive transform from the stereo mixture pair. Many basis vector pairs of the resulting transform exhibit properties suggesting that they represent the components of individual sources, together with the filtering process from the sources to the microphone pair. A mask is then applied to the transformed signal, with the mask parameters determined by relative delays between the learned left and right basis vector pairs. The performance of the proposed adaptive stereo basis (ASB) algorithm is compared with FD-ICA and DUET under different re...
Maria G. Jafari, Emmanuel Vincent, Samer A. Abdall