Abstract. This contribution proposes a compositional approach to visual object categorization of scenes. Compositions are learned from the Caltech 101 database1 intermediate abstractions of images that are semantically situated between low-level representations and the highlevel categorization. Salient regions, which are described by localized feature histograms, are detected as image parts. Subsequently compositions are formed as bags of parts with a locality constraint. After performing a spatial binding of compositions by means of a shape model, coupled probabilistic kernel classifiers are applied thereupon to establish the final image categorization. In contrast to the discriminative training of the categorizer, intermediate compositions are learned in a generative manner yielding relevant part agglomerations, i.e. groupings which are frequently appearing in the dataset while simultaneously supporting the discrimination between sets of categories. Consequently, compositionality sim...
Björn Ommer, Joachim M. Buhmann