In this work we present a scalable feature set which is obtained by fitting orthogonal polynomials to the normalized modulation spectrum of cepstral coefficients and which can be easily adapted to different classification tasks. The performance of the feature set is investigated in a hierarchically structured audio signal classification experiment and compared with other approaches reported in the literature. For the root categories speech, music and noise a classification accuracy of 95% is achieved. Subclasses such as male and female speech or different noise types are classified with an accuracy of 95% and 85%, respectively. In a 10-category musical genre discrimination experiment the proposed features exhibit an accuracy of 61%.
Anil M. Nagathil, Peter Gottel, Rainer Martin