We propose a new fault localization technique for software bugs in large-scale computing systems. Our technique always collects per-process function call traces of a target system...
Process monitoring refers to the task of detecting abnormal process operations resulting from the shift in the mean and/or the variance of one or more process variables. To success...
As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Up to now, system designers have prim...
George A. Reis, Jonathan Chang, Neil Vachharajani,...
Abstract. We consider a fixed, undirected, known network and a number of "mobile agents" which can traverse the network in synchronized steps. Some nodes in the network m...
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...