Transactional memory promises to make parallel programming easier than with fine-grained locking, while performing just as well. This performance claim is not always borne out bec...
Abstract. This article presents the C++ library vShark which reduces the intranode communication overhead of parallel programs on clusters of SMPs. The library is built on top of m...
We describe SAFL+: a call-by-value, parallel language in the style of ML which combines imperative, concurrent and functional programming. Synchronous channels allow communication ...
Scalable busy-wait synchronization algorithms are essential for achieving good parallel program performance on large scale multiprocessors. Such algorithms include mutual exclusio...
Robert W. Wisniewski, Leonidas I. Kontothanassis, ...
Efficiently using the hardware capabilities of the Cell processor, a heterogeneous chip multiprocessor that uses several levels of parallelism to deliver high performance, and bei...
Tarik Saidani, Joel Falcou, Claude Tadonki, Lionel...