We describe a method for retrieving shots containing a
particular 2D human pose from unconstrained movie and
TV videos. The method involves first localizing the spatial
layout of the head, torso and limbs in individual frames
using pictorial structures, and associating these through a
shot by tracking. A feature vector describing the pose is
then constructed from the pictorial structure. Shots can be
retrieved either by querying on a single frame with the desired
pose, or through a pose classifier trained from a set of
pose examples.
Our main contribution is an effective system for retrieving
people based on their pose, and in particular we propose
and investigate several pose descriptors which are person,
clothing, background and lighting independent. As a
second contribution, we improve the performance over existing
methods for localizing upper body layout on unconstrained
video.
We compare the spatial layout pose retrieval to a baseline
method where poses are retri...
Andrew Zisserman, Manuel J. Marín-Jim&eacut