A general framework simultaneously addressing pose
estimation, 2D segmentation, object recognition, and 3D
reconstruction from a single image is introduced in this
paper. The proposed approach partitions 3D space into
voxels and estimates the voxel states that maximize a likelihood
integrating two components: the object fidelity, that
is, the probability that an object occupies the given voxels,
here encoded as a 3D shape prior learned from 3D samples
of objects in a class; and the image fidelity, meaning the
probability that the given voxels would produce the input
image when properly projected to the image plane. We
derive a loop-less graphical model for this likelihood and
propose a computationally efficient optimization algorithm
that is guaranteed to produce the global likelihood maximum.
Furthermore, we derive a multi-resolution implementation
of this algorithm that permits to trade reconstruction
and estimation accuracy for computation. The
presentation of the p...