We present an efficient implementation of motion estimation (ME) for H.264/AVC using programmable graphics hardware. The cost function for ME in H.264/AVC depends on the motion vector (MV) predictor which is the median MV of three neighboring coded blocks. Previous implementations assume no dependency among adjacent blocks, which is not true for H.264/AVC, they also perform unsatisfactorily because of their low arithmetic intensity, which is defined as operation per word transferred. To overcome the dependency problem, we introduce a new implementation which performs ME on block-by-block basis. Moreover, we can adjust the arithmetic intensity easily to optimize the performance on different graphics cards. Experimental results show that our implementation is substantially faster (by 10 times) than our SIMD optimized CPU implementation.
Chi-Wang Ho, Oscar C. Au, S.-H. Gary Chan, Shu-Kei