We present a novel approach to inferring 3D volumetric shape of both moving objects and static background from video sequences shot by a moving camera, with the assumption that the objects move rigidly on a ground plane. The 3D scene is divided into a set of volume elements, termed as voxels, organized in an adaptive octree structure. Each voxel is assigned a label at each time instant, either as empty, or belonging to background structure, or a moving object. The task of shape inference is then formulated as assigning each voxel a dynamic label which minimizes photo and motion variance between voxels and the original sequence. We propose a three-step voxel labeling method based on a robust photo-motion variance measure. First, a sparse set of surface points are utilized to initialize a subset of voxels. Then, a deterministic voxel coloring scheme carves away the voxels with large variance. Finally, the labeling results are refined by a Graph Cuts based optimization method to enforce ...
Chang Yuan, Gérard G. Medioni