Graphics and media processing is quickly emerging to become one of the key computing workloads. Programmable graphics processors give designers extra flexibility by running a small program for each fragment in the graphics pipeline. This paper investigates low-cost mechanisms to obtain good performance for modern graphics programs on a general purpose CPU. This paper presents a compiler that compiles SIMD graphics program and generates efficient code on a general purpose CPU. The generated code can process between 25–0.3 million vertices per second on a 2.2 GHz Intel Pentium R 4 processor for a group of typical graphics programs. This paper also evaluates the impact of three changes in the architecture and compiler. Adding support for new specialized instructions improves the performance of the programs by 27.4 %. on average. A novel compiler optimization called mask analysis improves the performance of the programs by 19.5 % on average. Increasing the number of architectural SIMD...
Mauricio Breternitz Jr., Herbert H. J. Hum, Sanjee