— Fault tolerance in MPI becomes a main issue in the HPC community. Several approaches are envisioned from user or programmer controlled fault tolerance to fully automatic fault ...
Aurelien Bouteiller, Boris Collin, Thomas Hé...
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
A Split decoding algorithm is proposed which divides each row of the parity check matrix into two or multiple nearly-independent simplified partitions. The proposed method signific...
Push message delivery, where a client maintains an "always-on" connection with a server in order to be notified of a (asynchronous) message arrival in real-time, is incre...
— The basic operation of Delay Tolerant Mobile Sensor Network (DTMSN) is for pervasive data gathering in networks with intermittent connectivity, where traditional data gathering...
Jinqi Zhu, Jiannong Cao, Ming Liu, Yuan Zheng, Hai...