Abstract We present a new system for 3D head tracking and pose estimation in low-resolution, multi-view environments. Our approach consists of a joint particle filter scheme, that combines head shape evaluation with histograms of oriented gradients and pose estimation by means of artificial neural networks. The joint evaluation resolves previous problems of automatic alignment and multi-sensor fusion and gains an automatic system that is flexible against modifications in the available number of cameras. We evaluate on the CLEAR07 dataset for multi-view head pose estimation and achieve mean pose errors of 7.2◦ and 9.3◦ for pan and tilt respectively, which improves accuracy compared to our previous work by 14.9% and 25.8%.