Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
Scientific applications often need to access remote file systems. Because of slow networks and large data size, however, remote I/O can become an even more serious performance bot...
Jonghyun Lee, Robert B. Ross, Rajeev Thakur, Xiaos...
A key challenge in supporting data-driven scientific applications is the storage and management of input and output data in a distributed environment. In this paper, we describe a...
Stephen Langella, Shannon Hastings, Scott Oster, T...
JuxtaView is a cluster-based application for viewing ultra-high-resolution images on scalable tiled displays. We present in JuxtaView, a new parallel computing and distributed mem...
Naveen K. Krishnaprasad, Venkatram Vishwanath, Sha...
Grid computing brings with it additional complexities and unexpected failures. Just keeping track of our jobs traversing different grid resources before completion can at times be...
Single system image(SSI) systems have been the mainstay of high-performance computing for many years. SSI requires the integration and aggregation of all types of resources in a c...
The interaction of simultaneously co-allocated jobs can often create contention in the network infrastructure of a dedicated computational grid. This contention can lead to degrad...
William M. Jones, Louis W. Pang, Walter B. Ligon I...
In this paper we discuss issues related to the highperformance implementation of collective communications operations on distributed-memory computer architectures. Using a combina...
E. W. Chan, M. F. Heimlich, Avi Purkayastha, Rober...