This paper presents a novel multi-body multi-view stereo method to simultaneously recover dense depth maps and perform segmentation with the input of a monocular image sequence. Unlike traditional multi-view stereo approaches that generally handle a single static scene or an object, we show that depth estimation and segmentation can be jointly modeled and be globally solved in an energy minimization framework for ubiquitous scenes containing multiple independently moving rigid objects. Our major contribution includes a new multi-body stereo model, which integrates the color, geometry, and layer constraints for spatio-temporal depth recovery and automatic object segmentation. A twopass optimization scheme is proposed to progressively update the estimates. Our method is applied to a variety of challenging examples.