Many difficult visual perception problems, like 3D human motion estimation, can be formulated in terms of inference using complex generative models, defined over high-dimensional state spaces. Despite progress, optimizing such models is difficult because prior knowledge cannot be flexibly integrated in order to reshape an initially designed representation space. Nonlinearities, inherent sparsity of high-dimensional training sets, and lack of global continuity makes dimensionality reduction challenging and lowdimensional search inefficient. To address these problems, we present a learning and inference algorithm that restricts visual tracking to automatically extracted, non-linearly embedded, lowdimensional spaces. This formulation produces a layered generative model with reduced state representation, that can be estimated using efficient continuous optimization methods. Our prior flattening method allows a simple analytic treatment of low-dimensional intrinsic curvature constraints, a...
Cristian Sminchisescu, Allan D. Jepson