We propose a framework for estimation and analysis of temporal facial expression patterns of a speaker. The proposed system aims to learn personalized elementary dynamic facial expression patterns for a particular speaker. We use head-and-shoulder stereo video sequences to track lip, eye, eyebrow, and eyelid motion of a speaker in 3D. MPEG-4 Facial Definition Parameters (FDPs) are used as the feature set, and temporal facial expression patterns are represented by the MPEG-4 Facial Animation Parameters (FAPs). We perform Hidden Markov Model (HMM) based unsupervised temporal segmentation of upper and lower facial expression features separately to determine recurrent elementary facial expression patterns for a particular speaker. These facial expression patterns coded by FAP sequences, which may not be tied with prespecified emotions, can be used for personalized emotion estimation and synthesis of a speaker. Experimental results are presented.
Ferda Ofli, Engin Erzin, Yucel Yemez, A. Murat Tek