Convolution kernels, constructed by convolution of sub-kernels defined on sub-structures of composite objects, are widely used in classification, where one important issue is to choose adequate sub-structures, particularly for objects such as trees, graphs, and sequences. In this paper, we study the problem of sub-structure selection for constructing convolution kernels by combining heterogeneous kernels defined on different levels of substructures. Sub-kernels defined on different levels of sub-structures are combined together to incorporate their individual strengths because each level of sub-structure reflects its own angle to view the object. Two types of combination, linear and polynomial combination, are investigated. We analyze from the perspective of feature space why combined kernels exhibit potential advantages. Experiments indicate that the method will be helpful for combining kernels defined on arbitrary levels of sub-structures.