Markerless tracking of human pose is a hard yet relevant problem. In this paper, we derive an efficient filtering algorithm for tracking human pose at 4-10 frames per second using a stream of monocular depth images. The key idea is to combine an accurate generative model--which is achievable in this setting using programmable graphics hardware--with a discriminative model that feeds datadriven evidence about body part locations. In each filter iteration, we apply a form of local model-based search that exploits the nature of the kinematic chain. As fast movements and occlusion can disrupt the local search, we utilize a set of discriminatively trained patch classifiers to detect body parts. We describe a novel algorithm for propagating this noisy evidence about body part locations up the kinematic chain using the unscented transform. The resulting distribution of body configurations allows us to reinitialize the model-based search, which in turn allows our system to robustly recover fr...