Our objective is to obtain a state-of-the art object category
detector by employing a state-of-the-art image classifier
to search for the object in all possible image subwindows.
We use multiple kernel learning of Varma and
Ray (ICCV 2007) to learn an optimal combination of exponential
2 kernels, each of which captures a different feature
channel. Our features include the distribution of edges,
dense and sparse visual words, and feature descriptors at
different levels of spatial organization.
Such a powerful classifier cannot be tested on all image
sub-windows in a reasonable amount of time. Thus we propose
a novel three-stage classifier, which combines linear,
quasi-linear, and non-linear kernel SVMs. We show that
increasing the non-linearity of the kernels increases their
discriminative power, at the cost of an increased computational
complexity. Our contributions include (i) showing
that a linear classifier can be evaluated with a complexity
proportional to the num...