Faults in distributed systems can result in errors that manifest in several ways, potentially even in parts of the system that are not collocated with the root cause. These manife...
Andrew W. Williams, Soila M. Pertet, Priya Narasim...
The problem of optimal allocation of monitoring resources for tracking transactions progressing through a distributed system, modeled as a queueing network, is considered. Two for...
Detecting impending failure of hard disks is an important prediction task which might help computer systems to prevent loss of data and performance degradation. Currently most of t...
Operational network data, management data such as customer care call logs and equipment system logs, is a very important source of information for network operators to detect prob...
Chi-Yao Hong, Matthew Caesar, Nick G. Duffield, Ji...
In this paper, we present the design, implementation, and evaluation of iPlane, a scalable service providing accurate predictions of Internet path performance for emerging overlay...
Harsha V. Madhyastha, Tomas Isdal, Michael Piatek,...