The paper presents the framework of a special session that aims at investigating the best possible techniques for multimodal emotion recognition and expressivity analysis in human computer interaction, based on a common psychological background. The session mainly deals with audio and visual emotion analysis, with physiological signal analysis serving as supplementary to these modalities. Specific topics that are examined include extraction of emotional features and signs from each modality in separate, integration of the outputs of singlemode emotion analysis systems and recognition of the user’s emotional state, taking into account emotion models and existing knowledge or demands from both the analysis and synthesis perspective. Various labelling schemes, supply of accordingly labeled test databases, as well as synthesis of expressive avatars and affective interactions, are issues brought up and examined in the proposed framework.
Stefanos D. Kollias, Kostas Karpouzis