Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
One of the mostsoughtaftersoftware innovation of thisdecade is the construction of systems using off-the-shelf workstations that actually deliver, and even surpass, the power and ...
Dynamic error processing approaches are an important mechanism to increase the reliability in a multiprocessor system, while making efficient use of the available resources. To th...
Andrea Bondavalli, Silvano Chiaradonna, Felicita D...
Parallel processors such as SIMD computers have been successfully used in various areas of high performance image and data processing. Due to their characteristics of highly regula...
Clusters and distributed systems offer fault tolerance and high performance through load sharing. When all computers are up and running, we would like the load to be evenly distrib...