In this paper, we present a novel method for human action recognition with the combined global movement feature and local configuration feature. The human action is represented as a sequence of joints in the 4D spatio-temporal space, and modeled by two HMMs, a conventional HMM for global movement feature and an exemplar-based HMM for configuration feature. Firstly, an adaptive particle filter is adopted to track the marker-less actor’s 3D joints. Then, the combined features are extracted from the full body tracking results. Finally, the actions are classified by fusing two HMMs. The effectiveness of the proposed algorithm is demonstrated with experiments on 7 actions by 12 actors. The results prove robustness of the proposed method with respect to viewpoints and actors.