Decomposing video frames into coherent two-dimensional motion layers is a powerful method for representing videos. Such a representation provides an intermediate description that enables applications such as object tracking, video summarization and visualization, video insertion, and sprite-based video compression. Previous work on motion layer analysis has largely concentrated on two-frame or multiframe batch formulations. The temporal coherency of motion layers and the domain constraints on shapes have not been exploited. This paper introduces a complete dynamic motion layer representation in which spatial and temporal constraints on shape, motion, and layer appearance are modeled and estimated in a maximum a posteriori (MAP) framework using the generalized expectation-maximization (EM) algorithm. In order to limit the computational complexity of tracking arbitrarily shaped layer ownership, we propose a shape prior that parameterizes the representation of shape and prevents motion la...
Hai Tao, Harpreet S. Sawhney, Rakesh Kumar