This paper proposes a robust video fingerprinting method based on affine covariant regions. In video fingerprinting, a video clip is identified using short feature vectors referred to as fingerprints. In the proposed method, local fingerprints based on the centroid of gradient orientations are extracted from affine covariant regions detected in each frame. For the region detection, the maximally stable extremal region (MSER) detector which is considered to have high repeatability and low complexity is used. For reliable matching of the local fingerprints, only spatio-temporally consistent matches are taken into account. The experimental results show that the proposed method is robust against both geometric and non-geometric transformations.