Defining suitable features for environmental sounds is an important problem in an automatic acoustic scene recognition system. As with most pattern recognition problems, extracting the right feature set is the key to effective performance. A variety of features have been proposed for audio recognition, but the vast majority of the past work utilizes features that are well-known for structured data, such as speech and music, and assumes this association will transfer naturally well to unstructured sounds. In this paper, we propose a novel method based on matching pursuit (MP) to analyze environment sounds for their feature extraction. The proposed MP-based method utilizes a dictionary from which to select features, resulting in a representation that is flexible, yet intuitive and physically interpretable. We will show that these features are less sensitive to noise and are capable of effectively representing sounds that originate from different sources and different frequency ranges....
Selina Chu, Shrikanth S. Narayanan, C. C. Jay Kuo