Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...
In the synchronous composition of processes, one process may prevent another process from proceeding unless compositions without a wellde ned productbehavior are ruled out. They ca...
Luca de Alfaro, Thomas A. Henzinger, Freddy Y. C. ...
Abstract. We present PerfMiner, a system for the transparent collection, storage and presentation of thread-level hardware performance data across an entire cluster. Every sub-proc...
Philip Mucci, Daniel Ahlin, Johan Danielsson, Per ...
In this paper we consider the problem of monitoring detecting separation of agents from a base station in robotic and sensor networks. Such separation can be caused by mobility an...
In many distributed environments, the primary function of monitoring software is to detect anomalies, that is, instances when system behavior deviates substantially from the norm....
Shipra Agrawal, Supratim Deb, K. V. M. Naidu, Raje...