We investigate a method for learning object categories in a weakly supervised manner. Given a set of images known to contain the target category from a similar viewpoint, learning is translation and scaleinvariant; does not require alignment or correspondence between the training images, and is robust to clutter and occlusion. Category models are probabilistic constellations of parts, and their parameters are estimated by maximizing the likelihood of the training data. The appearance of the parts, as well as their mutual position, relative scale and probability of detection are explicitly described in the model. Recognition takes place in two stages. First, a feature-finder identifies promising locations for the model’s parts. Second, the category model is used to compare the likelihood that the observed features are generated by the category model, or are generated by background clutter. The flexible nature of the model is demonstrated by results over six diverse object categori...