Sciweavers

692 search results - page 79 / 139
» Balanced High Availability in Layered Distributed Computing ...
Sort
View
ICPADS
1994
IEEE
13 years 12 months ago
Efficient Fault Tolerance: An Approach to Deal with Transient Faults in Multiprocessor Architectures
Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To th...
Andrea Bondavalli, Silvano Chiaradonna, Felicita D...
ICS
2007
Tsinghua U.
14 years 2 months ago
A study of process arrival patterns for MPI collective operations
Process arrival pattern, which denotes the timing when different processes arrive at an MPI collective operation, can have a significant impact on the performance of the operatio...
Ahmad Faraj, Pitch Patarasuk, Xin Yuan
CCGRID
2008
IEEE
13 years 9 months ago
Using Probabilistic Characterization to Reduce Runtime Faults in HPC Systems
Abstract--The current trend in high performance computing is to aggregate ever larger numbers of processing and interconnection elements in order to achieve desired levels of compu...
Jim M. Brandt, Bert J. Debusschere, Ann C. Gentile...
ICPADS
2010
IEEE
13 years 5 months ago
Fault Tolerant Network Routing through Software Overlays for Intelligent Power Grids
Control decisions of intelligent devices in critical infrastructure can have a significant impact on human life and the environment. Insuring that the appropriate data is availabl...
Christopher Zimmer, Frank Mueller
HPDC
1999
IEEE
14 years 4 days ago
Starfish: Fault-Tolerant Dynamic MPI Programs on Clusters of Workstations
This paper reports on the architecture and design of Starfish, an environment for executing dynamic (and static) MPI-2 programs on a cluster of workstations. Starfish is unique in ...
Adnan Agbaria, Roy Friedman