We develop a framework to detect when certain sounds are present in a mixed audio signal. We focus on the regime where out of a large number of possible sounds, a small but unknown number are combined and overlapped to yield the observed signal. To infer which sounds are present, we attempt to decompose the observed signal as a linear combination of a small number of sources. To encourage sparse solutions with this property, we balance the modeling errors from individual sources against an 1-norm penalty of the type used in basis pursuit and regularized linear regression with grouped variables. Our approach can be viewed as a novel generalization of basis pursuit in two ways: first, with a dictionary of fixed size, we attempt to model acoustic waveforms of potentially variable duration; second, for dictionary entries, we do not store basis vectors representing static templates, but the coefficients of autoregressive models that characterize the acoustic variability of individual so...
Youngmin Cho, Lawrence K. Saul