In this paper, we aim at modeling video sequences that exhibit temporal appearance variation. The dynamic texture model proposed in [6] is effective to model simple dynamic scenes. However, because of its oversimplified appearance model and under-constrained dynamics model, the visual quality of its synthesized video sequences is often not satisfactory. This leads to our new model. We parameterize the nonlinear image manifold using mixtures of probabilistic principal component analyzers. We then align coefficients from different mixture components in a global coordinate system, and model the image dynamics in the global coordinate using an autoregressive process. The experimental results show that our method is capable of capturing complex temporal appearance variation and offers improved synthesis results over previous works.