Dictionary learning through matrix factorization has become widely popular for performing music transcription and source separation. These methods learn a concise set of dictionary atoms which represent spectrograms of musical objects. However, there is no guarantee that the atoms learned will be perceptually meaningful, particularly when there exists significant spectral and temporal overlap among the musical sources. In this paper, we propose a novel dictionary learning method that imposes additional harmonic constraints upon the atoms of the learned dictionary while allowing the dictionary size to grow appropriately during the learning procedure. When there is significant spectral-temporal overlap among the musical sources, our method outperforms popular existing matrix factorization methods as measured by the recall and precision of learned dictionary atoms.
Steven K. Tjoa, Matthew C. Stamm, W. Sabrina Lin,