We present a discriminative approach to frame-by-frame head pose tracking that is robust to a wide range of illuminations and facial appearances and that is inherently immune to accuracy drift. Most previous research on head pose tracking has been validated on test datasets spanning only a small (< 20) subjects under controlled illumination conditions on continuous video sequences. In contrast, the system presented in this paper was both trained and tested on a much larger database, GENKI, spanning tens of thousands of different subjects, illuminations, and geographical locations from images on the Web. Our pose estimator achieves accuracy of 5.82◦ , 5.65◦ , and 2.96◦ root-meansquare (RMS) error for yaw, pitch, and roll, respectively. A set of 4000 images from this dataset, labeled for pose, was collected and released for use by the research community.
Jacob Whitehill, Javier R. Movellan