The detection of people is one of the foremost problems for indexing, browsing and retrieval of video. The main difficulty is the large appearance variations caused by action, clothing, illumination, viewpoint and scale. Our work aims to find people in static video frames using learned models of the appearance of human body parts (head, limbs, hands) and the geometry of their assemblies. It is based on Forsyth & Fleck's general methodology of `body plans' and Felzenszwalb & Huttenlocher's dynamic programming approach for efficiently assembling candidate parts into `pictorial structures'. The simplistic part detectors used in these works made restrictive photometric assumptions that severely limited their practical applicability. Instead, we learn dedicated detectors for each body part using Support Vector Machines (SVMs) and the recently proposed Relevance Vector Machines (RVMs). In the past, SVMs have been successfully used to detect whole pedestrians in st...