Distributed stream processing systems (DSPSs) have many important applications such as sensor data analysis, network security, and business intelligence. Failure management is ess...
Xiaohui Gu, Spiros Papadimitriou, Philip S. Yu, Sh...
Faults in distributed systems can result in errors that manifest in several ways, potentially even in parts of the system that are not collocated with the root cause. These manife...
Andrew W. Williams, Soila M. Pertet, Priya Narasim...
In large-scale clusters and computational grids, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operatio...
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s BlueGene/L which can acc...
One has a large workload that is “divisible” (its constituent work’s granularity can be adjusted arbitrarily) and one has access to p remote computers that can assist in comp...
Anne Benoit, Yves Robert, Arnold L. Rosenberg, Fr&...