In this paper we consider the problem of object parsing,
namely detecting an object and its components by composing
them from image observations. Apart from object localization,
this involves the question of combining top-down
(model-based) with bottom-up (image-based) information.
We use an hierarchical object model, that recursively decomposes
an object into simple structures. Our first contribution
is the formulation of composition rules to build the
object structures, while addressing problems such as contour
fragmentation and missing parts. Our second contribution
is an efficient inference method for object parsing
that addresses the combinatorial complexity of the problem.
For this we exploit our hierarchical object representation to
efficiently compute a coarse solution to the problem, which
we then use to guide search at a finer level. This rules out a
large portion of futile compositions and allows us to parse
complex objects in heavily cluttered scenes.
Iasonas Kokkinos, Alan L. Yuille