We address the problem of detecting music in the background of ambient real-world audio recordings such as the sound track of consumer-shot video. Such material may contain high levels of noises, and we seek to devise features that will reveal music content in such circumstances. Sustained, steady musical pitches show significant, structured autocorrelation at when calculated over windows of hundreds of milliseconds, where autocorrelation of aperiodic noise has become negligible at higher-lag points if a signal is whitened by LPC. Using such features, further compensated by their long-term average to remove the effect of stationary periodic noise, we produce GMM and SVM based classifiers with high performance compared with previous approaches, as verified on a corpus of real consumer video.
Keansub Lee, Daniel P. W. Ellis