Audio-visual emotion expression by synthetic agents is widely employed in research, industrial, and commercial applications. However, the mechanism through which people judge the multimodal emotional display of these agents is not yet well understood. This study is an attempt to provide a better understanding of the interaction between video and audio channels through the use of a continuous dimensional evaluation framework of valence, activation, and dominance. The results indicate that the congruent audio-visual presentation contains information allowing users to differentiate between happy and angry emotional expressions to a greater degree than either of the two channels individually. Interestingly, however, sad and neutral emotions which exhibit a lesser degree of activation show more confusion when presented using both channels. Furthermore, when faced with a conflicting emotional presentation, users predominantly attended to the vocal channel. It is speculated that this is mos...
Emily Mower, Sungbok Lee, Maja J. Mataric, Shrikan