In the paper we present a method to estimate relative depth between objects in scenes of video sequences. The information for the estimation of the relative depth is obtained from the overlapping produced between objects when there is relative motion as well as from motion coherence between neighbouring regions. A relaxation labelling algorithm is used to solve conflicts and assign every region to a depth level. The depth estimation is used in a segmentation scheme which uses grey level information to produce a first segmentation. Regions of this partition are merged on the basis of their depth level.