This paper presents a method of learning and recognizing generic object categories using part-based spatial models. The models are multiscale, with a scene component that specifies relationships between the object and surrounding scene context, and an object component that specifies relationships between parts of the object. The underlying graphical model forms a tree structure, with a star topology for both the contextual and object components. A partially supervised paradigm is used for learning the models, where each training image is labeled with bounding boxes indicating the overall location of object instances, but parts or regions of the objects and scene are not specified. The parts, regions and spatial relationships are learned automatically. We demonstrate the method on the detection task on the PASCAL 2006 Visual Object Classes Challenge dataset, where objects must be correctly localized. Our results demonstrate better overall performance than those of previously reported t...
David J. Crandall, Daniel P. Huttenlocher