This paper shows how a state-of-the-art software distributed shared-memory (DSM) protocol can be efficiently extended to tolerate single-node failures. In particular, we extend a ...
Over the last twenty years, the open source community has provided more and more software on which the world’s High Performance Computing (HPC) systems depend for performance ...
Jack Dongarra, Peter H. Beckman, Terry Moore, Patr...
Abstract. In this paper, we take a MIPS-based open-source uniprocessor soft core, Plasma, and extend it to obtain the Beefarm infrastructure for FPGA-based multiprocessor emulation...
High-performance computing clusters running longlived tasks currently cannot have kernel software updates applied to them without causing system downtime. These clusters miss oppo...
As software Distributed Shared Memory(DSM) systems become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, w...