Abstract. This paper introduces a new model-based approach for simultaneously reconstructing 3D human motion and full-body skeletal size from a small set of 2D image features tracked from uncalibrated monocular video sequences. The key idea of our approach is to construct a generative human motion model from a large set of preprocessed human motion examples to constrain the solution space of monocular human motion tracking. In addition, we learn a generative skeleton model from prerecorded human skeleton data to reduce ambiguity of the human skeleton reconstruction. We formulate the reconstruction process in a nonlinear optimization framework by continuously deforming the generative models to best match a small set of 2D image features tracked from a monocular video sequence. We evaluate the performance of our system by testing the algorithm on a variety of uncalibrated monocular video sequences.