Abstract—Models that can efficiently, compactly, and semantically represent potential users are important tools for human-robot interaction applications. We model a person as a projection of a generic 3D articulated model and propose a method to estimate its joint positions from image data in an optimization framework. This is done by constructing a function that grades a configuration of joints according to how well it matches the underlying image and model based priors. We then search for local optimum in this space both efficiently and exhaustively by assembling partial configurations in a bottom-up manner. Working from the leaves of the tree to its root, we maintain a list of locally optimal, yet sufficiently distinct candidate configurations for the body pose. We then adapt this algorithm for use on a sequence of images to make it even more efficient by considering configurations that are near their position in the previous frame. This way, the number of partial configurations g...
Matheen Siddiqui, Gérard G. Medioni