Most work in computer vision has concentrated on studying the individual effects of motion and illumination on a 3D object. In this paper, we present a theory for combining the effects of motion, illumination, 3D structure, albedo, and camera parameters in a sequence of images obtained by a perspective camera. We show that the set of all Lambertian reflectance functions of a moving object, illuminated by arbitrarily distant light sources, lies ”close” to a bilinear subspace consisting of nine illumination variables and six motion variables. This result implies that, given an arbitrary video sequence, it is possible to recover the 3D structure, motion and illumination conditions simultaneously using the bilinear subspace formulation. The derivation is based on the intuitive notion that, given an illumination direction, the images of a moving surface cannot change suddenly over a short time period. We experimentally compare the images obtained using our theory with ground truth dat...
Yilei Xu, Amit K. Roy Chowdhury