We propose a simple and practical probabilistic comparison-based model, employing multiple incomplete test concepts, for handling fault location in distributed systems using a Bayesian analysis procedure. This approach is more practical and complete than previous ones since it does not assume any conditions such as permanently faulty units, complete tests, perfect environments, or non-malicious environments. Fault-free systems are handled without overhead, hence the test procedure may be used to monitor a functioning system. Given a system S with a specific test graph, the corresponding conditionaldistributionbetween the comparisontestresults (syndrome) and the fault patterns of S can be generated. To avoid the complex global Bayesian estimation process, we develop a simple bitwise Bayesian (B-) algorithm for fault location in S, which locates system failures with finear complexity, suitable for hard real-time systems.
Yu Lo Cyrus Chang, Leslie C. Lander, Horng-Shing L