This work presents a marker-less motion capture system that incorporates an approach to smoothly adapt a generic model mesh to the individual shape of a tracked person. This is done relying on extracted silhouettes only. Thus, during the capture process the 3D model of a tracked person is learned. Depending on a sparse number of 2D-3D correspondences, that are computed along normal directions from image sequences of different cameras, a Laplacian mesh editing tool generates the final adapted model. With the increasing number of frames an approach for temporal coherence reduces the effects of insufficient correspondence data to a minimum and guarantees smooth adaptation results. Further, we present experiments on non-optimal data that show the robustness of our algorithm.