—The Cell processor consists of a general-purpose core and eight cores with a complete SIMD instruction set. Although originally designed for multimedia and gaming, it is currently being used for a much broader range of applications. In this paper we evaluate if the Cell SPEs could benefit significantly from a scalar processing unit using two methodologies. In the first methodology the scalar processing overhead is eliminated by replacing all scalar data types by the quadword data type. This methodology is feasible only for relatively small kernels. In the second methodology SPE performance is compared to the performance of a similarly configured PPU, which supports scalar operations. Experimental results show that the scalar processing overhead ranges from 19% to 57% for small kernels and from 12% to 39% for large kernels. Solutions to eliminate this overhead are also discussed. Keywords-Computer architecture; Datapath; SIMD processing; SIMD overhead;
Arnaldo Azevedo Filho, Ben H. H. Juurlink