In this paper we present a framework for computing depth images at interactive rates. Our approach is based on combining time-of-flight (TOF) range data with stereo vision. We use a per-frame confidence map extracted from the TOF sensor data in two ways for improving the disparity estimation in the stereo part: first, together with the TOF range data for initializing and constraining the disparity range; and, second, together with the color image information for segmenting the data into depth continuous areas, enabling the use of adaptive windows for the disparity search. The resulting depth images are more accurate than from either of the sensors. In an example application we use the depth map to initialize the z-buffer so that virtual objects can be occluded by real objects in an augmented reality scenario.