This paper examines how the performance of a shared-memory multiprocessor can be improved by including hardware support for block transfers. A system similar to the Hector multipr...
Detecting races is important for debugging shared-memory parallel programs, because the races result in unintended nondeterministic executions of the programs. Previous on-the- y t...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program performance on large scale multiprocessors. Such algorithms include mutual exclusio...
Robert W. Wisniewski, Leonidas I. Kontothanassis, ...
This paper presents a programming language for parallel computing based on code annotations. It has similar goals and philosophy as OpenMP but it is more tightly coupled to the ob...
Traditionally, cache coherence in large-scale shared-memory multiprocessors has been ensured by means of a distributed directory structure stored in main memory. In this way, the ...