Heterogeneous computing combines general purpose CPUs with accelerators to efficiently execute both sequential control-intensive and data-parallel phases of applications. Existin...
Isaac Gelado, Javier Cabezas, Nacho Navarro, John ...
A microprocessor integrated with DRAM on the same die has the potential to improve system performance by reducing the memory latency and improving the memory bandwidth. However, a...
In this paper we present the problem of flow graph balancing for minimizingthe required memory bandwidth. Our goal is to minimize the required memory bandwidth within the given cy...
Sven Wuytack, Francky Catthoor, Gjalt G. de Jong, ...
Multiple memory module architecture enjoys higher memory access bandwidth and thus higher performance. Two key problems in gaining high performance in this kind of architecture ar...
The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past dec...