Classifying soundtracks with audio texture features

14 years 6 months ago

Download www.ee.columbia.edu

Sound textures may be deﬁned as sounds whose character depends on statistical properties as much as the speciﬁc details of each individually-perceived event. Recent work has devised a set of statistics that, when synthetically imposed, allow listeners to identify a wide range of environmental sound textures. In this work, we investigate using these statistics for automatic classiﬁcation of a set of environmental sound classes deﬁned over a set of web videos depicting “multimedia events”. We show that the texture statistics perform as well as our best conventional statistics (based on MFCC covariance). We further examine the relative contributions of the different statistics, showing the importance of modulation spectra and cross-band envelope correlations.

Daniel P. W. Ellis, Xiaohong Zeng, Josh H. McDermo

Real-time Traffic