We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (spms), whose unit of transfer can be an individual instruction. Our algorithm ca...
Energy consumption is of significant concern in battery operated embedded systems. In the processors of such systems, the instruction cache consumes a significant fraction of the ...
This paper presents an extensive characterization, tuning, and optimization of parallel I/O on the Cray XT supercomputer, named Jaguar, at Oak Ridge National Laboratory. We have c...
Abstract--This paper proposes an analytical model to estimate the cost of running an affinity-based thread schedule on multicore systems. The model consists of three submodels to e...
Recent technological advances have produced network interfaces that provide users with very low-latency access to the memory of remote machines. We examine the impact of such netw...
Leonidas I. Kontothanassis, Galen C. Hunt, Robert ...