In this paper, we present a Dynamic Load Balancing (DLB) policy for problems characterized by a highly irregular search tree, whereby no reliable workload prediction is available....
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, o...
We describe and test a software approach to overcoming radiation-induced errors in spaceborne applications running on commercial off-the-shelf components. The approach uses checks...
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
In this paper, we analyze the fault tolerance of several bounded-degree networks that are commonly used for parallel computation. Among other things, we show that an N-node butterf...
Frank Thomson Leighton, Bruce M. Maggs, Ramesh K. ...