In this paper, we show how to estimate, accurately and efficiently, the 3D motion of a rigid or non-rigid object, and time-varying lighting in a dynamic scene. This is achieved in an inverse compositional tracking framework with a novel warping function that involves a 2D → 3D → 2D transformation. The method is guaranteed to converge, is able to work with rigid and non-rigid objects, and estimates the lighting and motion from a video sequence. Experimental analysis on multiple face video sequences shows impressive speed-up over existing methods while retaining a high level of accuracy.
Yilei Xu, Amit K. Roy Chowdhury