Abstract. This paper presents a method to recover the 3D configuration of a face in each frame of a video. The 3D configuration consists of the three translational parameters and the three orientation parameters which correspond to the yaw, pitch and roll of the face. Such information is important for applications like face modeling, recognition, expression analysis, etc. which require head stabilization. The approach combines the structural advantages of geometric modeling with the statistical advantages of a particle-filter based inference. The face is modeled as the curved surface of a cylinder which is free to translate and rotate arbitrarily. The geometric modeling takes care of pose and self-occlusion while the statistical modeling handles moderate occlusion and illumination variations. Experimental results on multiple datasets are provided to show the efficacy of the approach. The insensitivity of our approach to calibration parameters (focal length) is also shown.