To improve the performance of a single application on Chip Multiprocessors (CMPs), the application must be split into threads which execute concurrently on multiple cores. In mult...
M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi...
Abstract—Convergence of communication, consumer applications and computing within mobile systems pushes memory requirements both in terms of size, bandwidth and power consumption...
Marco Facchini, Trevor Carlson, Anselme Vignon, Ma...
Abstract. This paper presents an innovative architecture for a reconfigurable device that allows single cycle context switching and single cycle random access to the unified on-chi...
Reetinder P. S. Sidhu, Sameer Wadhwa, Alessandro M...
The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical s...
Thomas L. Sterling, Daniel Savarese, Peter MacNeic...
We present the MBRAM model for static evaluation of the performance of memory-bound programs. The MBRAM model predicts the actual running time of a memory-bound program directly fr...