This paper proposes a method for capturing the performance
of a human or an animal from a multi-view video
sequence. Given an articulated template model and silhouettes
from a multi-view image sequence, our approach recovers
not only the movement of the skeleton, but also the
possibly non-rigid temporal deformation of the 3D surface.
While large scale deformations or fast movements are captured
by the skeleton pose and approximate surface skinning,
true small scale deformations or non-rigid garment
motion are captured by fitting the surface to the silhouette.
We further propose a novel optimization scheme for
skeleton-based pose estimation that exploits the skeleton’s
tree structure to split the optimization problem into a local
one and a lower dimensional global one. We show on various
sequences that our approach can capture the 3D motion
of animals and humans accurately even in the case of rapid
movements and wide apparel like skirts.