ct Fault Tolerant MPI (FT-MPI)[6] was designed as a solution to allow applications different methods to handle process failures beyond simple check-point restart schemes. The init...
Graham E. Fagg, Thara Angskun, George Bosilca, Jel...
— Server responsiveness and scalability are more important than ever in today’s client/server dominated network environments. Recently, researchers have begun to consider clust...
Xuehong Gan, Trevor Schroeder, Steve Goddard, Byra...
Stampede is a parallel programming system to facilitate the programming of interactive multimedia applications on clusters of SMPs. In a Stampede application, a variable number of...
GPU-based heterogeneous clusters continue to draw attention from vendors and HPC users due to their high energy efficiency and much improved single-node computational performance...
We present a novel protocol, M3L, for multicast tree fault isolation based purely upon end-to-end information. Here, a fault is a link with a loss rate exceeding a specified thres...
Timur Friedman, Donald F. Towsley, James F. Kurose