Sciweavers

402 search results - page 19 / 81
» Fault-tolerance in the Borealis distributed stream processin...
Sort
View
JSS
1998
90views more  JSS 1998»
13 years 7 months ago
A taxonomy of distributed termination detection algorithms
An important problem in the ®eld of distributed systems is that of detecting the termination of a distributed computation. Distributed termination detection (DTD) is a dicult p...
Jeff Matocha, Tracy Camp
ICPPW
2009
IEEE
13 years 5 months ago
Analyzing Checkpointing Trends for Applications on the IBM Blue Gene/P System
Current petascale systems have tens of thousands of hardware components and complex system software stacks, which increase the probability of faults occurring during the lifetime ...
Harish Gapanati Naik, Rinku Gupta, Pete Beckman
ICDCS
1996
IEEE
13 years 11 months ago
An Evaluation of the Amoeba Group Communication System
The Amoeba group communication system has two unique aspects: (1) it uses a sequencer-based protocol with negative acknowledgements for achieving a total order on all group messag...
M. Frans Kaashoek, Andrew S. Tanenbaum
JPDC
2008
132views more  JPDC 2008»
13 years 7 months ago
Assurance of dynamic adaptation in distributed systems
Long running applications often need to adapt due to changing requirements or changing environment. Typically, such adaptation is performed by dynamically adding or removing compo...
Karun N. Biyani, Sandeep S. Kulkarni
ISCA
2010
IEEE
219views Hardware» more  ISCA 2010»
14 years 14 days ago
Using hardware vulnerability factors to enhance AVF analysis
Fault tolerance is now a primary design constraint for all major microprocessors. One step in determining a processor’s compliance to its failure rate target is measuring the Ar...
Vilas Sridharan, David R. Kaeli