This paper addresses the issue of tracking translation and rotation simultaneously. Starting with a kernel-based spatial-spectral model for object representation, we define an ?-norm similarity measure between the target object and the observation, and derive a new formulation to the tracking of translational and rotational object. Based on the tracking formulation, an iterative procedure is proposed. We also develop an adaptive kernel model to cope with varying appearance. Experimental results are presented for both synthetic data and real-world traffic video.