Segmentation and tracking of multiple humans in crowded situations is made difficult by interobject occlusion. We propose a model-based approach to interpret the image observations by multiple partially occluded human hypotheses in a Bayesian framework. We define a joint image likelihood for multiple humans based on the appearance of the humans, the visibility of the body obtained by occlusion reasoning, and foreground/background separation. The optimal solution is obtained by using an efficient sampling method, data-driven Markov chain Monte Carlo (DDMCMC), which uses image observations for proposal probabilities. Knowledge of various aspects, including human shape, camera model, and image cues, are integrated in one theoretically sound framework. We present experimental results and quantitative evaluation, demonstrating that the resulting approach is effective for very challenging data.