We combine detection and tracking techniques to achieve robust 3?D motion recovery of people seen from arbitrary viewpoints by a single and potentially moving camera. We rely on detecting key postures, which can be done reliably, using a motion model to infer 3?D poses between consecutive detections, and finally refining them over the whole sequence using a generative model. We demonstrate our approach in the case of people walking against cluttered backgrounds and filmed using a moving camera, which precludes the use of simple background subtraction techniques. In this case, the easy-to-detect posture is the one that occurs at the end of each step when people have their legs furthest apart.