Systems on chip (SoC) have much in common with traditional (networked) distributed systems in that they consist of largely independent components with dedicated communication inte...
Running unit tests suites with contemporary tools such as JUNIT can show the presence of bugs, but not their locations. This is different from checking a program with a compiler, w...
We present fault localization techniques suitable for diagnosing end-to-end service problems in communication systems with complex topologies. We refine a layered system model th...
Abstract--Fault localization in enterprise networks is extremely challenging. A recent approach called Sherlock makes some headway into this problem by using an inference algorithm...
Fault tolerance is a major concern to guarantee availability of critical services as well as application execution. Traditional approaches for fault tolerance include checkpoint/r...