Kernel classifiers based on Support Vector Machines (SVM) have recently achieved state-of-the art results on several popular datasets like Caltech or Pascal. This was possible by combining the advantages of SVM – convexity and the availability of efficient optimizers, with ‘hyperkernels’ – linear combinations of kernels computed at multiple levels of image encoding. The use of hyperkernels faces the challenge of choosing the kernel weights, the use of possibly irrelevant, poorly performing kernels, and an increased number of parameters that can lead to overfitting. In this paper we advocate the transition from SVMs to Support Kernel Machines (SKM) – models that estimate both the parameters of a sparse linear combination of kernels, and the parameters of a discriminative classifier. We exploit recent kernel learning techniques, not previously used in computer vision, that show how learning SKMs can be formulated as a convex optimization problem, which can be solved effic...