Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
—Understanding the communication behavior and network resource usage of parallel applications is critical to achieving high performance and scalability on systems with tens of th...
Ron Brightwell, Kevin T. Pedretti, Kurt B. Ferreir...
: Several solutions have been developed to provide dataintensive applications with the highest possible data rates. Such solutions tried to utilize the available network resources ...
PODOS is a performance oriented distributed operating system being developed to harness the performance capabilities of a cluster computing environment. In order to address the gr...
Orca is a portable, object-based distributed shared memory system. This paper studies and evaluates the design choices made in the Orca system and compares Orca with other DSMs. T...
Henri E. Bal, Raoul Bhoedjang, Rutger F. H. Hofman...