This article describes a multiple feature data fusion applied to an auxiliary particle filter for markerless tracking of 3D two-arm gestures by using a single camera mounted on a mobile robot. The human limbs are composed of a set of linked degenerated quadrics which are truncated by pairs of planes also modelled as degenerated quadrics. The method relies on the projection of the model's silhouette and local features located on the model surface, to validate the particles (associated configurations) which correspond to the best model-to-image fits. Our cost metric combines robustly two imaging cues i.e. model contours and colour or texture based patches located on the model surface, subject to 3D joint limits and also non self-intersection constraints. The results show the robustness and versatility of our data fusion based approach.