Tracking 3D people from monocular video is often poorly constrained. To mitigate this problem, prior knowledge should be exploited. In this paper, the Gaussian process spatio-temporal variable model (GPSTVM), a novel dynamical system modeling method is proposed for learning human pose and motion priors. The GPSTVM provides a low dimensional embedding of human motion data, with a smooth density function that provides higher probability to the poses and motions close to the training data. The low dimensional latent space is optimized directly to retain the spatio-temporal structure of the high dimensional pose space. After the prior on human pose is learned, the particle filtering can be used tracking articulated human pose; particle filtering propagates over time in the embedding space, avoiding the curse of dimensionality. Experiments demonstrate that our approach tracks 3D people accurately.