Recent research in machine learning has focused on breaking audio spectrograms into separate sources of sound using latent variable decompositions. These methods require that the number of sources be specified in advance, which is not always possible. To address this problem, we develop Gamma Process Nonnegative Matrix Factorization (GaP-NMF), a Bayesian nonparametric approach to decomposing spectrograms. The assumptions behind GaP-NMF are based on research in signal processing regarding the expected distributions of spectrogram data, and GaP-NMF automatically discovers the number of latent sources. We derive a mean-field variational inference algorithm and evaluate GaP-NMF on both synthetic data and recorded music.
Matthew D. Hoffman, David M. Blei, Perry R. Cook