The increasing complexity of configurable software systems creates a need for more intelligent sampling mechanisms to detect and locate failure-inducing dependencies between confi...
Adam A. Porter, Myra B. Cohen, Sandro Fouché...
In this paper, we consider the problem of supporting fault tolerance for adaptive and time-critical applications in heterogeneous and unreliable grid computing environments. Our g...
We expect that in future commodity hardware will be used in safety critical applications. But the used commodity microprocessors will become less reliable because of decreasing fe...
Abstract. The paper discusses a distributed approach for monitoring and diagnosing the execution of a plan where concurrent actions are performed by a team of cooperating agents. T...
1 In this paper we address the problem of selecting probe station locations from where probes can be sent to monitor all the nodes in the network. Probe station placement involves...