There are two avenues for many-core machines to gain higher performance: increasing the number of processors, and increasing the number of vector units in one SIMD processor. A tru...
Xiaohuang Huang, Christopher I. Rodrigues, Stephen...
The latency of broadcast/reduction operations has a significant impact on the performance of SIMD processors. This is especially true for associative programs, which make extensiv...
—This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, t...
The performance of SIMD processors is often limited by the time it takes to transfer data between the centralized control unit and the parallel processor array. This is especially...