—Ultra large scale (ULS) systems are future software intensive systems that have billions of lines of code, composed of heterogeneous, changing, inconsistent and independent elem...
— Today’s system monitoring tools are capable of detecting system failures such as host failures, OS errors, and network partitions in near-real time. Unfortunately, the same c...
Dan Gunter, Brian Tierney, Aaron Brown, D. Martin ...
In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Computational grids can integrate geographically distributed resources into a seamless environment. To facilitate managing these heterogenous resources, the virtual machine gy pro...
Dependability is a key factor in any software system due to the potential costs in both time and money a failure may cause. Given the complexity of Grid applications that rely on ...