Optical motion capture can be classified as an inference problem: given the data produced by a set of cameras, the aim is to extract the hidden state, which in this case encodes the posture of the subject’s body. Problems with motion capture arise due to the multi-modal nature of the likelihood distribution, the extremely large dimensionality of its state-space, and the narrow region of support of local modes. There are also problems with the size of the data and the difficulty with which useful visual cues can be extracted from it, as well as how informative these cues might be. Several algorithms exist that use stochastic methods to extract the hidden state, but although highly parallelisable in theory, such methods produce a heavy computational overhead even with the power of today’s computers. In this paper we assume a set of pre-calibrated cameras and only extract the subject’s silhouette as a visual cue. In order to describe the 2D silhouette data we define a 2D model co...