Abstract. We describe compiler and run-time optimisations for effective autoparallelisation of C++ programs on the Cell BE architecture. Auto-parallelisation is made easier by anno...
Scalability of parallel architectures is an interesting area of current research. Shared memory parallel programming is attractive stemming from its relative ease in transitioning...
Umakishore Ramachandran, Gautam Shah, Ravi Kumar, ...
Shared memory multiprocessor systems typically provide a set of hardware primitives in order to support synchronization. Generally, they provide single-word read-modify-write hard...
Deterministic replay systems record and reproduce the execution of a hardware or software system. In contrast to replaying execution on uniprocessors, deterministic replay on mult...
Kaushik Veeraraghavan, Dongyoon Lee, Benjamin West...
Matrix transpose in parallel systems typically involves costly all-to-all communications. In this paper, we provide a comparative characterization of various efficient algorithms f...