—Soft errors due to cosmic rays cause reliability problems during lifetime operation of digital systems, which increase exponentially with Moore’s law. The first step in develo...
This paper presents the design and implementation of the Distributed Autonomous Replication Management (DARM) framework built on top of the Spread group communication system. The ...
Many cooperated web cache systems and protocols have been proposed. But, these systems need the expensive resources, such as core-link bandwidth and proxy cpu or storage, and need...
Checkpointing and rollback recovery is a very effective technique to tolerate transient faults and preventive shutdowns. In the past, most of the checkpointing schemes published i...
— We have entered an era where chip yields are decreasing with scaling. A new concept called intelligible testing has been previously proposed with the goal of reversing this tre...