Discovering a representation that allows auditory data to be parsimoniously represented is useful for many machine learning and signal processing tasks. Such a representation can be constructed by Non-negative Matrix Factorisation (NMF), which is a method for finding parts-based representations of non-negative data. Here, we present a convolutive NMF algorithm that includes a sparseness constraint on the activations and has multiplicative updates. In combination with a spectral magnitude transform of speech, this method extracts speech phones that exhibit sparse activation patterns, which we use in a supervised separation scheme for monophonic mixtures.
Paul D. O'Grady, Barak A. Pearlmutter