The productivity of HPC system is determined not only by their performance, but also by their reliability. The conventional method to limit the impact of failures is checkpointing...
: Since the advent of Gnutella, Peer-to-Peer (P2P) protocols have matured towards a fundamental design element for large-scale, self-organising distributed systems. Many research e...
In systems consisting of multiple clusters of processors which employ space sharing for scheduling jobs, such as our Distributed ASCI1 Supercomputer (DAS), coallocation, i.e., the...
Abstract. This paper empirically explores the advantages of the collaboration between different parallel compute sites in a decentralized grid scenario. To this end, we assume ind...
Christian Grimme, Joachim Lepping, Alexander Papas...
In high energy physics, bioinformatics, and other disciplines, we encounter applications involving numerous, loosely coupled jobs that both access and generate large data sets. So...