This paper discusses the use of a combination of support vector machine and decision tree learning for recognizing four emotions in speech, which are Neutral, Angry, Lombard, and Loud. The base features selected were pitch, derivative of pitch, energy, speaking rate, formants, bandwidths, and Mel Frequency Cepstral Coefficients. Three methods of combining learned support vector machine and decision tree classifiers were proposed, namely, minimum misclassification, maximum accuracy, and dominant class. Using the Speech Under Simulated and Actual Stress database, the average accuracy from the minimum misclassification, maximum accuracy, and dominant class methods were 72.4%,
Thao Nguyen, Mingkun Li, Iris Bass, Ishwar K. Seth