We propose a model of the joint variation of shape and appearance of portions of an image sequence. The model is conditionally linear, and can be thought of as an extension of active appearance models to exploit the temporal correlation of adjacent image frames. Inference of the model parameters can be performed efficiently using established numerical optimization techniques borrowed from finite-element analysis and system identification techniques.