We present a method to simultaneously estimate 3d body pose and action categories from monocular video sequences. Our approach learns a lowdimensional embedding of the pose manifolds using Locally Linear Embedding (LLE), as well as the statistical relationship between body poses and their image appearance. In addition, the dynamics in these pose manifolds are modelled. Sparse kernel regressors capture the nonlinearities of these mappings efficiently. Body poses are inferred by a recursive Bayesian sampling algorithm with an activity-switching mechanism based on learned transfer functions. Using a rough foreground segmentation, we compare Binary PCA and distance transforms to encode the appearance. As a postprocessing step, the globally optimal trajectory through the entire sequence is estimated, yielding a single pose estimate per frame that is consistent throughout the sequence. We evaluate the algorithm on challenging sequences with subjects that are alternating between running and ...
Tobias Jaeggli, Esther Koller-Meier, Luc J. Van Go