In this work we propose a model for video scenes that contain temporal variability in shape and appearance. We propose a conditionally linear model akin to a dynamic extension of active appearance models. We formulate the problem variationally, and propose a framework where a model complexity cost dictates the "modeling responsibility" of each of the factors: appearance, shape and motion. We render the learning problem well-posed by reverting to a physical and a dynamic prior, and use the finite element method to compute a numerical solution. We illustrate our model to learn and simulate the shape, appearance, and motion of scenes that exhibit some form of temporal regularity, intended in a statistical sense.