We report on an automated runtime anomaly detection method at the application layer of multi-node computer systems. Although several network management systems are available in th...
Power consumption is a troublesome design constraint for emergent systems such as IBM’s BlueGene /L. If current trends continue, future petaflop systems will require 100 megawat...
As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Up to now, system designers have prim...
George A. Reis, Jonathan Chang, Neil Vachharajani,...
Distributed storage systems provide data availability by means of redundancy. To assure a given level of availability in case of node failures, new redundant fragments need to be ...
Alessandro Duminuco, Ernst Biersack, Taoufik En-Na...
It is considered good distributed computing practice to devise object implementations that tolerate contention, periods of asynchrony and a large number of failures, but perform f...