Recent technological advances have enabled human users to interact with computers in ways previously unimaginable. Beyond the confines of the keyboard and mouse, new modalities for human-computer interaction such as voice, gesture, and force-feedback are emerging. However, one necessary ingredient for natural interaction is still missing - emotions. This paper describes the problem of bimodal emotion recognition and advocates the use of probabilistic graphical models when fusing the different modalities. We test our audio-visual emotion recognition approach on 38 subjects with 11 HCI-related affect states. The experimental results show that the average person-dependent emotion recognition accuracy is greatly improved when both visual and audio information are used in classification.
Nicu Sebe, Ira Cohen, Theo Gevers, Thomas S. Huang