We present an approach for object recognition that combines detection and segmentation within a efficient hypothesize/test framework. Scanning-window template classifiers are the current state-of-the-art for many object classes such as faces, cars, and pedestrians. Such approaches, though quite successful, can be hindered by their lack of explicit encoding of object shape/structure ? one might, for example, find faces in trees. We adopt the following strategy; we first use these systems as attention mechanisms, generating many possible object locations by tuning them for low missed-detections and high false-positives. At each hypothesized detection, we compute a local figure-ground segmentation using a window of slightly larger extent than that used by the classifier. This segmentation task is guided by top-down knowledge. We learn offline from training data those segmentations that are consistent with true positives. We then prune away those hypotheses with bad segmentations. We show...