We present an approach to tracking human activities in a monocular video. We model the human body by decomposing it into torso and limbs and use simple 3D shapes to approximate them. The limb motions are parametrized by the relative joint angles. The problems of motion tracking and estimation are posed as nonlinear state estimation problems. The measurements are computed using the outputs of 3D shape-encoded filters which extract the boundary gradient information of the body image. The uncertainties of body pose are propagated by a branching particle system. We first sample a set of particles approximating the initial distribution of the state vector conditioned on observations, where each particle encodes the body pose. The posterior density is realized by the weight of the particle, where the weight represents geometric and temporal fit, and computed bottom-up from the raw image using a shape-encoded filter. The particles branch so that the mean number of offspring is proportional t...