As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been...
Exploiting compile time knowledge to improve memory bandwidth can produce noticeable improvements at run-time [13, 1]. Allocating the data structure [13] to separate memories when...
This work is based on our philosophy of providing interlayer system-level power awareness in computing systems [26, 27]. Here, we couple this approach with our vision of multipart...
Osman S. Unsal, Israel Koren, C. Mani Krishna, Csa...
This paper presents a new version of the OMPi OpenMP C compiler, enhanced by lightweight runtime support based on user-level multithreading. A large number of threads can be spawne...
Panagiotis E. Hadjidoukas, Vassilios V. Dimakopoul...
This paper presents an analytical model to predict the performance of general-purpose applications on a GPU architecture. The model is designed to provide performance information ...
Sara S. Baghsorkhi, Matthieu Delahaye, Sanjay J. P...