This paper presents a method to estimate the position of object using contextual information. Although convention methods used only shape contextual information, color contextual information is also effective to describe scenes. Thus we use both shape and color contextual information. To estimate the object position from only contextual information, the Support Vector Regression is used. We choose the Pyramid Match Kernel which measures the similarity between histograms because our contextual information is described as histogram. When one kernel is applied to a feature vector which consists of color and shape, the similarity of each feature is not used effectively. Thus, kernels are applied to color and shape independently, and the weighted sum of the outputs of both kernels is used. We confirm that the proposed method outperforms conventional methods.