In this paper we describe a trace analysis framework, from trace generation to visualization. It includes a unified tracing facility on IBM® SP™ systems, a self-defining interv...
Ching-Farn Eric Wu, Anthony Bolmarcich, Marc Snir,...
This article describes a new benchmark, called the Effective System Performance (ESP) test, which is designed to measure system-level performance, including such factors as job sc...
Adrian T. Wong, Leonid Oliker, William T. C. Krame...
We aimed to study the performance of a parallel implementation of an intraoperative nonrigid registration algorithm that accurately simulates the biomechanical properties of the b...
Simon K. Warfield, Matthieu Ferrant, Xavier Gallez...
As evidenced by the popularity of MPI (Message Passing Interface), message passing is an effective programming technique for managing coarse-grained concurrency on distributed com...
The performance of the MPI’s collective communications is critical in most MPI-based applications. A general algorithm for a given collective communication operation may not giv...
Sathish S. Vadhiyar, Graham E. Fagg, Jack Dongarra
Algebraic multigrid methods offer the hope that multigrid convergence can be achieved (for at least some important applications) without a great deal of effort from engineers an...
We describe the MPI/SX implementation of the MPI-2 standard for one-sided communication (Remote Memory Access) for the NEC SX-5 vector supercomputer. MPI/SX is a non-threaded impl...
We report on our work in developing a fine-grained multithreaded solution for the communicationintensive Conjugate Gradient (CG) problem. In our recent work, we developed a simpl...
Kevin B. Theobald, Gagan Agrawal, Rishi Kumar, Ger...
This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a ...