Convergence is often the key liveness property for distributed systems that interact with physical processes. Techniques for proving convergence (asymptotic stability) have been ex...
Replication is a key technique for improving fault tolerance. Replication can also improve application performance under some circumstances, but can have the opposite effect under...
A major component of many cloud services is query processing on data stored in the underlying cloud cluster. The traditional techniques for query processing on a cluster are those...
Adrian Daniel Popescu, Debabrata Dash, Verena Kant...
The goal of online failure prediction is to forecast imminent failures while the system is running. This paper compares Similar Events Prediction (SEP) with two other well-known t...
In large-scale clusters and computational grids, component failures become norms instead of exceptions. Failure occurrence as well as its impact on system performance and operatio...