Microprocessor vendors have provided special-purpose instructions such as psadbw and pdist to accelerate the sumof-absolute differences (SAD) similarity measurement. The usefulness of these special-purpose instructions is limited except for the motion estimation kernel. This has several drawbacks. First, if the SAD becomes obsolete because a different similarity metric is going to be employed, then those special-purpose instructions are no longer useful. Second, these special instructions process 8-bit subwords only. This precision is not sufficient for some kernels such as motion estimation in the transform domain. In addition, when employing other n-way parallel SIMD instructions to implement the SAD and sum-of-squared differences (SSD), the obtained speedup is much less than n. This is because there is a mismatch between the storage and the computational format. In this paper, we design and evaluate a variety of SIMD instructions for different data types. We synthesize special-...
Asadollah Shahbahrami, Ben H. H. Juurlink, Stamati