We address the problem of learning view-invariant 3D models of human motion from motion capture data, in order to recognize human actions from a monocular video sequence with arbitrary viewpoint. We propose a SpatioTemporal Manifold (STM) model to analyze non-linear multivariate time series with latent spatial structure and apply it to recognize actions in the joint-trajectories space. Based on STM, a novel alignment algorithm Dynamic Manifold Warping (DMW) and a robust motion similarity metric are proposed for human action sequences, both in 2D and 3D. DMW extends previous works on spatio-temporal alignment by incorporating manifold learning. We evaluate and compare the approach to state-of-the-art methods on motion capture data and realistic videos. Experimental results demonstrate the effectiveness of our approach, which yields visually appealing alignment results, produces higher action recognition accuracy, and can recognize actions from arbitrary views with partial occlusion.