The high dimensionality of system observation, together with the frequent changes of system normal behavior resulting from workload variations, makes fault detection very difficu...
In this paper, we describe a scheme for tolerating and recovering from mid-query faults in a distributed shared nothing database. Rather than aborting and restarting queries, our s...
Christopher Yang, Christine Yen, Ceryen Tan, Samue...
— This paper presents a distributed version of our previous work, called SAFDetection, which is a sensor analysisbased fault detection approach that is used to monitor tightlycou...
Formal methods can improve the development of systems with high quality requirements, since they usually o er a precise, nonambiguous speci cation language and allow rigorous veri ...
The increasing complexity of today’s systems makes fast and accurate failure detection essential for their use in mission-critical applications. Various monitoring methods provi...