Inferring 3D body pose as well as viewpoint from a single silhouette image is a challenging problem. We present a new generative model to represent shape deformations according to view and body configuration changes on a two dimensional manifold. We model the two continuous states by a product space (different configurations ? different views) embedded on a conceptual two dimensional torus manifold. We learn a nonlinear mapping between torus manifold embedding and visual input (silhouettes) using empirical kernel mapping. Since every view and body pose has a corresponding embedding point on the torus manifold, inferring view and body pose from a given image becomes estimating the embedding point from a given input. As the shape varies in different people even in the same view and body pose, we extend our model to be adaptive to different people by decomposing person dependent style factors. Experimental results with real data as well as synthetic data show simultaneous estimation of v...
Ahmed M. Elgammal, Chan-Su Lee