This paper presents a learning-based method for combining the shape and appearance feature types for 3D human pose estimation from single-view images. Our method is based on clustering the 3D pose space into several modular regions and learning the regressors for both feature types and their optimal fusion scenario in each region. This way the complementary information of the individual feature types is exploited, leading to improved performance of pose estimation. We train and evaluate our method using a synchronized video and 3D motion dataset. Our experimental results show that the proposed feature combination method gave more accurate pose estimation than that from each individual feature type.
Suman Sedai, Mohammed Bennamoun, Du Q. Huynh