—Automatic continuous affective state prediction from naturalistic facial expression is a very challenging research topic but very important in human-computer interaction. One of the main challenges is modeling the dynamics that characterize naturalistic expressions. In this paper, a novel two-stage automatic system is proposed to continuously predict affective dimension values from facial expression videos. In the first stage, traditional regression methods are used to classify each individual video frame, while in the second stage, a Time-Delay Neural Network (TDNN) is proposed to model the temporal relationships between consecutive predictions. The two-stage approach separates the emotional state dynamics modeling from an individual emotional state prediction step based on input features. In doing so, the temporal information used by the TDNN is not biased by the high variability between features of consecutive frames and allows the network to more easily exploit the slow changin...