Abstract--A new framework, termed Spatially Aligned Pyramid Matching (SAPM), is proposed for Near Duplicate Image Identification. The proposed method robustly handles spatial shifts as well as scale changes, and is extensible for video data. Images are divided into both overlapped and non-overlapped blocks over multiple levels. In the first matching stage, pairwise distances between blocks from the examined image pair are computed using Earth Mover's Distance (EMD) or the visual word with 2 distance based method with SIFT features. In the second stage, multiple alignment hypotheses that consider piecewise spatial shifts and scale variation are postulated and resolved using integer-flow EMD. Moreover, to compute the distances between two videos, we conduct the third step matching (i.e., temporal matching) after spatial matching. Two application scenarios are addressed