The Convex SPP-1000 is the first commercial implementation of a new generation of scalable shared memory parallel computers with full cache coherence. It employs a hierarchical s...
Thomas L. Sterling, Daniel Savarese, Peter MacNeic...
Overlapping communication with computation is a well-known technique to increase application performance. While it is commonly assumed that communication and computation can be ov...
Barbara Kreaseck, Larry Carter, Henri Casanova, Je...
Designing and optimizing high performance microprocessors is an increasingly difficult task due to the size and complexity of the processor design space, high cost of detailed si...
P. J. Joseph, Kapil Vaswani, Matthew J. Thazhuthav...
In this paper, we empirically evaluate fundamental design trade-offs among the most recent multicore processors and accelerator technologies. Our primary aim is to aid application...
Efficient implementation of DSP applications is critical for many embedded systems. Optimising C compilers for embedded processors largely focus on code generation and instructio...