This paper studies an interesting yet less explored behavior of memory access instructions, called access region locality. Unlike the traditional temporal and spatial data localit...
We recently proposed a new approach to parallelization, by decomposing the time domain, instead of the conventional space domain. This improves latency tolerance, and we demonstrat...
Recently, the high-performance computing community has realized that power is a performance-limiting factor. One reason for this is that supercomputing centers have limited power ...
Robert Springer, David K. Lowenthal, Barry Rountre...
For maximum performance, an out-of-order processor must issue load instructions as early as possible, while avoiding memory-order violations with prior store instructions that wri...
Performance models provide significant insight into the performance relationships between an application and the system used for execution. The major obstacle to developing perfor...
Valerie E. Taylor, Xingfu Wu, Jonathan Geisler, Ri...