We propose a new fault localization technique for software bugs in large-scale computing systems. Our technique always collects per-process function call traces of a target system...
We present a statistical debugging algorithm that isolates bugs in programs containing multiple undiagnosed bugs. Earlier statistical algorithms that focus solely on identifying p...
Ben Liblit, Mayur Naik, Alice X. Zheng, Alexander ...
As software Distributed Shared Memory(DSM) systems become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, w...
— The global Internet routing infrastructure is a large and complex distributed system where routing changes occur constantly. Our objective in this paper is to develop a simple ...
Mohit Lad, Ricardo V. Oliveira, Daniel Massey, Lix...
As device geometries continue to shrink, single event upsets are becoming of concern to a wider spectrum of system designers. These “soft errors” can be a nuisance or catastro...