Global variable promotion, i.e. allocating unaliased globals to registers, can significantly reduce the number of memory operations. This results in reduced cache activity and less...
This paper explores a novel way to incorporate hardware-programmable resources into a processor microarchitecture to improve the performance of general-purpose applications. Throu...
Tarantula is an aggressive floating point machine targeted at technical, scientific and bioinformatics workloads, originally planned as a follow-on candidate to the EV8 processo...
Roger Espasa, Federico Ardanaz, Julio Gago, Roger ...
Performance auditing is an online optimization strategy that empirically measures the effectiveness of an optimization on a particular code region. It has the potential to greatly...
Jeremy Lau, Matthew Arnold, Michael Hind, Brad Cal...
Three dimensional (3D) graphics applications have become very important workloads running on today's computer systems. A cost-effective graphics solution is to perform geomet...