Sciweavers

698 search results - page 28 / 140
» Synthesis of Fault-Tolerant Distributed Systems
Sort
View
IPPS
1998
IEEE
13 years 12 months ago
A Flexible Approach for a Fault-Tolerant Router
: Cluster systems gain more and more importance as a platform for parallel computing. In this area the power of the system is strongly coupled with the performance of the network, ...
Andreas C. Döring, Wolfgang Obelöer, Gun...
IPPS
2008
IEEE
14 years 2 months ago
Enhancing application robustness through adaptive fault tolerance
As the scale of high performance computing (HPC) continues to grow, application fault resilience becomes crucial. To address this problem, we are working on the design of an adapt...
Zhiling Lan, Yawei Li, Ziming Zheng, Prashasta Guj...
CCGRID
2003
IEEE
14 years 27 days ago
Improved Read Performance in a Cost-Effective, Fault-Tolerant Parallel Virtual File System (CEFT-PVFS)
Due to the ever-widening performance gap between processors and disks, I/O operations tend to become the major performance bottleneck of data-intensive applications on modern clus...
Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David ...
IPPS
2005
IEEE
14 years 1 months ago
Current Practice and a Direction Forward in Checkpoint/Restart Implementations for Fault Tolerance
Checkpoint/restart is a general idea for which particular implementations enable various functionalities in computer systems, including process migration, gang scheduling, hiberna...
José Carlos Sancho, Fabrizio Petrini, Kei D...
CCGRID
2008
IEEE
13 years 9 months ago
Fault Tolerance in Cluster Federations with O2P-CF
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Thomas Ropars, Christine Morin