Different data acquisition methods are tailored at extracting particular characteristics from a scene and by combining their results a more robust scene description can be created. A method to fuse perceptual groupings extracted from color-based segmentation and depth information from stereo using supervised classification is presented. The merging of data from these two acquisition modules allows for a spatially coherent blend of smooth regions and detail in an image. Depth cues are used to limit the area of interest in the scene and to improve perceptual grouping solving subsegmentation and oversegmentation of the original images. The complexity of the algorithm does not exceed that of the individual acquisition modules. The resulting scene description can then be fed to an object recognition module for scene interpretation.