Sciweavers

116 search results - page 1 / 24
» A Communication Framework for Fault-Tolerant Parallel Execut...
Sort
View
LCPC
2009
Springer
14 years 15 hour ago
A Communication Framework for Fault-Tolerant Parallel Execution
PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availabili...
Nagarajan Kanna, Jaspal Subhlok, Edgar Gabriel, Es...
IPPS
2007
IEEE
14 years 1 months ago
Self Adaptive Application Level Fault Tolerance for Parallel and Distributed Computing
Most application level fault tolerance schemes in literature are non-adaptive in the sense that the fault tolerance schemes incorporated in applications are usually designed witho...
Zizhong Chen, Ming Yang, Guillermo A. Francia III,...
IPPS
2005
IEEE
14 years 1 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Sebastian Gerlach, Roger D. Hersch
ICPP
2000
IEEE
13 years 11 months ago
A Problem-Specific Fault-Tolerance Mechanism for Asynchronous, Distributed Systems
The idle computers on a local area, campus area, or even wide area network represent a significant computational resource--one that is, however, also unreliable, heterogeneous, an...
Adriana Iamnitchi, Ian T. Foster
DICS
2006
13 years 11 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules: A Programmer's Perspective
Dynamic Parallel Schedules (DPS) is a flow graph based framework for developing parallel applications on clusters of workstations. The DPS flow graph execution model enables automa...
Sebastian Gerlach, Basile Schaeli, Roger D. Hersch