Most modern Chip Multiprocessors (CMP) feature shared cache on chip. For multithreaded applications, the sharing reduces communication latency among co-running threads, but also r...
This paper introduces a compiler-orchestrated prefetching system as a unified framework geared toward ameliorating the gap between processing speeds and memory access latencies. ...
Rodric M. Rabbah, Hariharan Sandanagobalane, Mongk...
Complex embedded systems have always been heterogeneous multicore systems. Because of the tight constraints on power, performance and cost, this situation is not likely to change a...
In recent years, several approaches have been proposed to use profile information in compiler optimization. This profile information can be used at the source level to guide loo...
Masayo Haneda, Peter M. W. Knijnenburg, Harry A. G...
Energy-aware compilers are becoming increasingly important for embedded systems due to the need to meet conflicting constraints on time, code size and power consumption. We intro...