This study explores manifold representations of emotionally modulated speech. The manifolds are derived in the articulatory space and two acoustic spaces (MFB and MFCC) using isometric feature mapping (Isomap) with data from an emotional speech corpus. Their effectiveness in representing emotional speech is tested based on the emotion classification accuracy. Results show that the effective manifold dimensions of the articulatory and MFB spaces are both about 5 while being greater in MFCC space. Also, the accuracies in the articulatory and MFB manifolds are close to those in the original spaces, but this is not the case for the MFCC. It is speculated that the manifold in the MFCC space is less structured, or more distorted, than others.
Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan