The objective of this paper is classifying images by the object categories they contain, for example motorbikes or dolphins. There are three areas of novelty. First, we introduce a descriptor that represents local image shape and its spatial layout, together with a spatial pyramid kernel. These are designed so that the shape correspondence between two images can be measured by the distance between their descriptors using the kernel. Second, we generalize the spatial pyramid kernel, and learn its level weighting parameters (on a validation set). This significantly improves classification performance. Third, we show that shape and appearance kernels may be combined (again by learning parameters on a validation set). Results are reported for classification on Caltech-101 and retrieval on the TRECVID 2006 data sets. For Caltech101 it is shown that the class specific optimization that we introduce exceeds the state of the art performance by more than 10%. Categories and Subject Descrip...