In this paper, we show that in a multi-camera context, we can effectively handle occlusions in real-time at each frame independently, even when the only available data comes from the binary output of a simple blob detector, and the number of present individuals is a priori unknown. We start from occupancy probability estimates in a top view and rely on a generative model to yield probability images to be compared with the actual input images. We then refine the estimates so that the probability images match the binary input images as well as possible. We demonstrate the quality of our results on several sequences involving complex occlusions.