In this paper we examine how application performance scales on a state-of-the-art shared virtual memory (SVM) system on a cluster with 64 processors, comprising 4-way SMPs connect...
Dongming Jiang, Brian O'Kelley, Xiang Yu, Sanjeev ...
Buffered CoScheduled (BCS) MPI is a novel implementation of MPI based on global synchronization of all system activities. BCS-MPI imposes a model where all processes and their com...
Service-orientation has been proposed as a way of facilitating the development and integration of increasingly complex and heterogeneous system components. However, there are many...
The transition to multithreaded, multi-core designs places a greater responsibility on programmers and software for improving performance; thread-level parallelism (TLP) will be i...
Tipp Moseley, Daniel A. Connors, Dirk Grunwald, Ra...
Computational biology research is now faced with the burgeoning number of genome data. The rigorous postprocessing of this data requires an increased role for high performance com...