Sciweavers

2226 search results - page 29 / 446
» Fault-Tolerant Parallel Applications with Dynamic Parallel S...
Sort
View
SPAA
2009
ACM
14 years 8 months ago
The weakest failure detector for wait-free dining under eventual weak exclusion
Dining philosophers is a classic scheduling problem for local mutual exclusion on arbitrary conflict graphs. We establish necessary conditions to solve wait-free dining under even...
Srikanth Sastry, Scott M. Pike, Jennifer L. Welch
ICS
2004
Tsinghua U.
14 years 1 months ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J...
CCGRID
2008
IEEE
14 years 2 months ago
Performance and Availability Tradeoffs in Replicated File Systems
Replication is a key technique for improving fault tolerance. Replication can also improve application performance under some circumstances, but can have the opposite effect under...
Jiaying Zhang, Peter Honeyman
ICCCN
2007
IEEE
14 years 2 months ago
An Energy-Efficient Scheduling Algorithm Using Dynamic Voltage Scaling for Parallel Applications on Clusters
In the past decade cluster computing platforms have been widely applied to support a variety of scientific and commercial applications, many of which are parallel in nature. Howev...
Xiaojun Ruan, Xiao Qin, Ziliang Zong, Kiranmai Bel...
HCW
1998
IEEE
13 years 12 months ago
CCS Resource Management in Networked HPC Systems
CCS is a resource management system for parallel high-performance computers. At the user level, CCS provides vendor-independent access to parallel systems. At the system administr...
Axel Keller, Alexander Reinefeld