Sciweavers

442 search results - page 51 / 89
» Fault Tolerant Wide-Area Parallel Computing
Sort
View
CCGRID
2006
IEEE
14 years 1 months ago
IPMI-based Efficient Notification Framework for Large Scale Cluster Computing
The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and even...
Chokchai Leangsuksun, Tirumala Rao, Anand Tikoteka...
TROB
2002
120views more  TROB 2002»
13 years 9 months ago
DPAC: an object-oriented distributed and parallel computing framework for manufacturing applications
Parallel and distributed computing infrastructure are increasingly being embraced in the context of manufacturing applications, including real-time scheduling. In this paper, we pr...
N. R. Srinivasa Raghavan, Tanmay Waghmare
CCGRID
2006
IEEE
14 years 3 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
PVM
2010
Springer
13 years 8 months ago
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
CCGRID
2006
IEEE
14 years 3 months ago
MPI-Mitten: Enabling Migration Technology in MPI
Group communications are commonly used in parallel and distributed environment. However, existing migration mechanisms do not support group communications. This weakness prevents ...
Cong Du, Xian-He Sun