SIMD extension is one of the most common and effective technique to exploit data-level parallelism in today’s processor designs. However, the performance of SIMD architectures is limited by some constraints such as mismatch between the storage and the computational formats and using data permutation instructions during vectorization. In our previous work we have proposed two architectural modifications, the extended subwords and the Matrix Register File (MRF) to alleviate the limitations. The extended subwords, uses four extra bits for every byte in a media register and it provides additional parallelism. The MRF allows flexible row-wise as well as column-wise access to the register file and it eliminates data permutation instructions. We have validated the combination of the proposed techniques by studying the performance of some multimedia kernels. In this paper, we analysis each proposed technique separately. In other words, we answer the following questions in this paper. How...
Asadollah Shahbahrami, Ben H. H. Juurlink