We address the classic problems of detection, segmenta-
tion and pose estimation of people in images with a novel
definition of a part, a poselet. We postulate two criteria
(1) It should be easy to find a poselet given an input image
(2) it should be easy to localize the 3D configuration of the
person conditioned on the detection of a poselet. To permit
this we have built a new dataset, H3D, of annotations of
humans in 2D photographs with 3D joint information, in-
ferred using anthropometric constraints. This enables us to
implement a data-driven search procedure for finding pose-
lets that are tightly clustered in both 3D joint configuration
space as well as 2D image appearance. The algorithm dis-
covers poselets that correspond to frontal and profile faces,
pedestrians, head and shoulder views, among others.
Each poselet provides examples for training a linear
SVM classifier which can then be run over the image in a
multiscale scanning mode. The outputs of these posel...