We propose an approach to overcome the two main challenges
of 3D multiview object detection and localization:
The variation of object features due to changes in the viewpoint
and the variation in the size and aspect ratio of the
object. Our approach proceeds in three steps. Given an
initial bounding box of fixed size, we first refine its aspect
ratio and size. We can then predict the viewing angle, under
the hypothesis that the bounding box actually contains
an object instance. Finally, a classifier tuned to this particular
viewpoint checks the existence of an instance. As a
result, we can find the object instances and estimate their
poses, without having to search over all window sizes and
potential orientations.
We train and evaluate our method on a new object
database specifically tailored for this task, containing real world
objects imaged over a wide range of smoothly varying
viewpoints and significant lighting changes. We show
that the successive estimations of t...