We present a method to automatically extract spatio-temporal descriptions of moving objects from synchronized and calibrated multi-view sequences. The object is modeled by a time-varying multi-resolution subdivision surface that is fitted to the image data using spatio-temporal multi-view stereo information, as well as contour constraints. The stereo data is utilized by computing the normalized correlation between corresponding spatio-temporal image trajectories of surface patches, while the contour information is determined using incremental segmentation of the viewing volume into object and background. We globally optimize the shape of the spatio-temporal surface in a coarse-to-fine manner using the multi-resolution structure of the subdivision mesh. The method presented incorporates the available image information in a unified framework and automatically reconstructs accurate spatio-temporal representations of complex non-rigidly moving objects.