Sciweavers

1113 search results - page 15 / 223
» Performance under Failures of DAG-based Parallel Computing
Sort
View
IPPS
1999
IEEE
13 years 11 months ago
The Performance of Coordinated and Independent Checkpointing
Checkpointing is a very effective technique to tolerate the occurrence of failures in distributed and parallel applications. The existing algorithms in the literature are basicall...
Luís Moura Silva, João Gabriel Silva
ICDCS
1995
IEEE
13 years 11 months ago
Parallel Processing on Networks of Workstations: A Fault-Tolerant, High Performance Approach
One of the mostsoughtaftersoftware innovation of thisdecade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and ...
Partha Dasgupta, Zvi M. Kedem, Michael O. Rabin
CCGRID
2008
IEEE
14 years 1 months ago
Application Resilience: Making Progress in Spite of Failure
Abstract—While measures such as raw compute performance and system capacity continue to be important factors for evaluating cluster performance, such issues as system reliability...
William M. Jones, John T. Daly, Nathan DeBardelebe...
PODC
2005
ACM
14 years 28 days ago
On reliable broadcast in a radio network
— We consider the problem of reliable broadcast in an infinite grid (or finite toroidal) radio network under Byzantine and crash-stop failures. We present bounds on the maximum...
Vartika Bhandari, Nitin H. Vaidya
IPPS
2007
IEEE
14 years 1 months ago
Models and Heuristics for Robust Resource Allocation in Parallel and Distributed Computing Systems
This is an overview of the robust resource allocation research efforts that have been and continue to be conducted by the CSU Robustness in Computer Systems Group. Parallel and di...
David L. Janovy, Jay Smith, Howard Jay Siegel, Ant...