Abstract. In this work, we introduce a model-based approach to extracting the silhouette of people in motion from stereo video sequences. To this end, we extend a purely stereo-based approach to tracking people proposed in earlier work. This approach is based on an implicit surface model of the body. It lets us accurately predict the silhouette's location and, therefore, detect them more robustly. In turn these silhouettes allow us to fit the model more precisely. This allows effective motion recovery, even when people are filmed against a cluttered unknown background. This is in contrast to many recent approaches that require silhouette contours to be readily obtainable using relatively simple methods, such as background subtraction, that typically require either engineering the scene or making strong assumptions. We demonstrate our approach's effectiveness using complex and fully three-dimensional motion sequences where the ability to combine stereo and silhouette informati...