Vector prefix and reduction are collective communication primitives in which all processors must cooperate. We present two parallel algorithms, the direct algorithm and the split ...
The Cell Broadband Engine (Cell BE) is a heterogeneous multi-core processor specifically designed to exploit thread-level parallelism. Its memory model comprehends a common shared ...
Runtime parallel optimization has been suggested as a means to overcome the difficulties of parallel programming. For runtime parallel optimization to be effective, parallelism a...
David A. Penry, Daniel J. Richins, Tyler S. Harris...
Configurations of contemporary DRAM memory systems become increasingly complex. A recent study [5] shows that application performance is highly sensitive to choices of configura...
This paper reports our experiences on the Scalable Network Of Workstation (SNOW) project, which implements a novel methodology to support user-level process migration for traditio...