In today’s high performance computing practice, fail-stop failures are often tolerated by checkpointing. While checkpointing is a very general technique and can often be applied...
Using multiple independent networks (also known as rails) is an emerging technique to overcome bandwidth limitations and enhance fault tolerance of current high-performance parall...
Salvador Coll, Eitan Frachtenberg, Fabrizio Petrin...
The growth in the number of generally available, distributed, heterogeneous computing systems places increasing importance on the development of user-friendly tools that enable ap...
Richard L. Graham, Galen M. Shipman, Brian Barrett...
In this paper we describe the design and implementation of CellSort − a high performance distributed sort algorithm for the Cell processor. We design CellSort as a distributed b...
Applications need to become more concurrent to take advantage of the increased computational power provided by chip level multiprocessing. Programmers have traditionally managed t...
Bratin Saha, Ali-Reza Adl-Tabatabai, Richard L. Hu...