Sciweavers

295 search results - page 10 / 59
» Invariants Based Failure Diagnosis in Distributed Computing ...
Sort
View
GI
2004
Springer
14 years 28 days ago
Crash Management for Distributed Parallel Systems
: With the growing complexity of parallel architectures, the probability of system failures grows, too. One approach to cope with this problem is the self-healing, one of the organ...
Jan Haase, Frank Eschmann
IPPS
2000
IEEE
13 years 12 months ago
Consensus Based on Failure Detectors with a Perpetual Accuracy Property
This paper is on the Consensus problem, in the context of asynchronous distributed systems made of n processes, at most f of them may crash. A family of failure detector classes s...
Achour Mostéfaoui, Michel Raynal
EUROPAR
2005
Springer
14 years 1 months ago
Faults in Large Distributed Systems and What We Can Do About Them
Scientists are increasingly using large distributed systems built from commodity off-the-shelf components to perform scientific computation. Grid computing has expanded the scale ...
George Kola, Tevfik Kosar, Miron Livny
FOSSACS
2006
Springer
13 years 11 months ago
Distributed Unfolding of Petri Nets
Some recent Petri net-based approaches to fault diagnosis of distributed systems suggest to factor the problem into local diagnoses based on the unfoldings of local views of the sy...
Paolo Baldan, Stefan Haar, Barbara König
ICPP
2007
IEEE
14 years 1 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...