We present a novel approach to speech-driven facial animation using a non-parametric switching state space model based on Gaussian processes. The model is an extension of the shared Gaussian process dynamical model, augmented with switching states. Audio and visual data from a talking head corpus are jointly modelled using the proposed method. The switching states are found using variable length Markov models trained on labelled phonetic data. We also propose a synthesis technique that takes into account both previous and future phonetic context, thus accounting for coarticulatory effects in speech. Categories and Subject Descriptors I.5.4 [Image Processing and Computer Vision]: Applications--Computer vision, Signal processing Keywords speech-driven facial animation, visual speech synthesis, artificial talking head General Terms algorithms, theory, experimentation