This paper considers a method for speech emotion recognition by a max-margin framework incorporating a loss function based on a well-known model called the Watson and Tellegen’s emotion model. Each emotion is modeled by a singlestate hidden Markov model (HMM) that is trained by maximizing the minimum separation margin between emotions, and the margin is scaled by a loss function. The framework is optimized by the semi-definite programming. Experiments were performed to evaluate the framework using the Berlin database of emotional speech. The framework performed better than other conventional training criteria for HMM such as maximum likelihood estimation and maximum mutual information estimation.
Sungrack Yun, Chang D. Yoo