Video matting, or layer extraction, is a classic inverse problem in computer vision that involves the extraction of foreground objects, and the alpha mattes that describe their opacity, from a set of images. Modern approaches that work with natural backgrounds often require user-labelled "trimaps" that segment each image into foreground, background and unknown regions. For long sequences, the production of accurate trimaps can be time consuming. In contrast, another class of approach depends on automatic background extraction to automate the process, but existing techniques do not make use of spatiotemporal consistency, and cannot take account of operator hints such as trimaps. This paper presents a method inspired by natural image statistics that cleanly unifies these approaches. A prior is learnt that models the relationship between the spatiotemporal gradients in the image sequence and those in the alpha mattes. This is used in combination with a learnt foreground colour ...
Nicholas Apostoloff, Andrew W. Fitzgibbon