1 This paper presents a simple but effective method to reduce on-chip access latency and improve core isolation in CMP Non-Uniform Cache Architectures (NUCA). The paper introduces ...
Javier Merino, Valentin Puente, Pablo Prieto, Jos&...
One of the essential features in modern computer systems is context switching, which allows multiple threads of execution to time-share a limited number of processors. While very ...
Fang Liu, Fei Guo, Yan Solihin, Seongbeom Kim, Abd...
In this paper, we describe an algorithm and implementation of locality optimizations for architectures with instruction sets such as Intel’s SSE and Motorola’s AltiVec that su...
This paper describes the implementation of a runtime library for asynchronous communication in the Cell BE processor. The runtime library implementation provides with several servi...
Cache behavior modeling is an important part of modern optimizing compilers. In this paper we present a method to estimate the number of cache misses, at compile time, using a mac...