Extracting a computer model of a real scene from a sequence of views, is one of the most challenging and fundamental problems in computer vision. Stereo vision algorithms allow us to extract from the images a sparse 3D point cloud on the scene surfaces. However, computing an accurate mesh of the scene based on such poor quality data points (noise, sparsity) is very difficult. Here we describe a simple yet original approach that uses both the stereo vision extracted point cloud and the calibrated images. Our method is a three-stage process in which the first stage merges, filters and smoothes the input 3D points. The second stage builds for each calibrated image a triangular depth-map and fuses the set of depth-maps into a triangle soup that minimize violations of size and visibility constraints. Finally, a mesh is computed from the triangle soup using a reconstruction method that combines restricted Delaunay triangulation and Delaunay refinement. Categories and Subject Descriptors:...