We address the problem of instantaneous, underdetermined audio source separation by time-frequency masking. Using oracle estimators, we determine experimental upper performance bounds, by assuming that we have reference sources available, and that we know, or have estimated, the mixing structure. Oracle estimation of four musical sources from two-channel mixtures demonstrates a potential for SDR improvements of up to 12.7 dB, compared to semi-blind methods. We also show that using adaptive cosine packet transforms, rather than fixed-basis STFTs, can improve performance by up to 2.2 dB. Finally, by allowing more than one non-zero source coefficient per time-frequency index, improvements of up to 7.7 dB could be possible.
Andrew Nesbit, Mark D. Plumbley