We describe a content based speech discrimination algorithm in broadcast news based on the time-varying information provided by the modulation spectrum. Due to the varying degrees of redundancy and discriminative power of the acoustic and modulation frequency subspaces, we first employ a generalization of SVD to tensors (Higher Order SVD) to reduce dimensions. We further select the optimal principal axes in each subspace based on mutual information. Projection of modulation spectral features in these axes results in a compact feature set at a very low cost for subsequent classification with SVMs. We present experimental comparison between our algorithm and MFCCs using the same classifier and dataset.
Maria E. Markaki, Yannis Stylianou