Emerging infrastructure of computational grids composed of Clusters-of-Clusters (CoC) interlinked through high throughput channels promises unprecedented raw compute power for ter...
In multicluster systems, and more generally, in grids, jobs may require co-allocation, i.e., the simultaneous allocation of resources such as processors and input files in multipl...
Gang Scheduling and related techniques are widely believed to be necessary for efficientjob scheduling on distributed memory parallel computers. This is hecause they minimize cont...
The complexity and cost of isolating the root cause of system problems in large parallel computers generally scales with the size of the system. Syslog messages provide a primary ...
Fault tolerance is a very important concern for critical high performance applications using the MPI library. Several protocols provide automatic and transparent fault detection a...
Pierre Lemarinier, Aurelien Bouteiller, Thomas H&e...
Scientific applications often need to access remote file systems. Because of slow networks and large data size, however, remote I/O can become an even more serious performance bot...
Jonghyun Lee, Robert B. Ross, Rajeev Thakur, Xiaos...
A key challenge in supporting data-driven scientific applications is the storage and management of input and output data in a distributed environment. In this paper, we describe a...
Stephen Langella, Shannon Hastings, Scott Oster, T...
JuxtaView is a cluster-based application for viewing ultra-high-resolution images on scalable tiled displays. We present in JuxtaView, a new parallel computing and distributed mem...
Naveen K. Krishnaprasad, Venkatram Vishwanath, Sha...