This paper deals with the problem of segmenting a video shot into a background (still) mosaic and one or more foreground moving objects. The method is based on ego-motion compensation and background estimation. In order to be able to cope with sequences where occluding objects persist in the same position for a considerable portion of time, the papers concentrates on robust background estimation method. First the sequence is subdivided in patches that are clustered along the time-line in order to narrow down the number of background candidates. Then the background is grown incrementally by selecting at each step the best continuation of the current background, according to the principles of visual grouping. The method rests on sound principles in all its stages, and only few, intelligible parameters are needed. Experiments with real sequences illustrate the approach.