This paper presents a quantification of the timing effects that advanced processor features like data and instruction cache, pipelines, branch prediction units and out-oforder ex...
ion layer3,4 hides hardware particulars from the higher levels of software but can also compromise performance and compatibility; the higher levels of software often make unwitting...
This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performa...
A trace of a workload’s system calls can be obtained with minimal interference, and can be used to drive repeatable experiments to evaluate system configuration alternatives. R...
As microprocessors become faster, the relative performance cost of memory accesses increases. Bigger and faster caches significantly reduce the absolute load-to-use time delay. Ho...