We present a new approach to iteratively estimate both
high-quality depth map and alpha matte from a single image
or a video sequence. Scene depth, which is invariant
to illumination changes, color similarity and motion
ambiguity, provides a natural and robust cue for foreground/
background segmentation – a prerequisite for matting.
The image mattes, on the other hand, encode rich
information near boundaries where either passive or active
sensing method performs poorly. We develop a method
to combine the complementary nature of scene depth and
alpha matte to mutually enhance their qualities. We formulate
depth inference as a global optimization problem
where information from passive stereo, active range sensor
and matte is merged. The depth map is used in turn to enhance
the matting. In addition, we extend this approach to
video matting by incorporating temporal coherence, which
reduces flickering in the composite video. We show that
these techniques lead to improved...