The compositional nature of visual objects significantly limits their representation complexity and renders learning of structured object models tractable. Adopting this modeling strategy we both (i) automatically decompose objects into a hierarchy of relevant compositions and we (ii) learn such a compositional representation for each category without supervision. The compositional structure supports feature sharing already on the lowest level of small image patches. Compositions are represented as probability distributions over their constituent parts and the relations between them. The global shape of objects is captured by a graphical model which combines all compositions. Inference based on the underlying statistical model is then employed to obtain a category level object recognition system. Experiments on large standard benchmark datasets underline the competitive recognition performance of this approach and they provide insights into the learned compositional structure of objec...
Björn Ommer, Joachim M. Buhmann