Dual-execution/checkpointing based transient error tolerance techniques have been widely used in the high-end mission critical systems. These techniques, however, are not very att...
To address the limitations of centralized shared storage for cloud computing, we are building Lithium, a distributed storage system designed specifically for virtualization workl...
Maintenance is the dominant source of downtime at high availability sites. Unfortunately, the dominant mechanism for reducing this downtime, cluster rolling upgrade, has two short...
Today’s complex applications must face the distribution of data and code among different network nodes. Java is a wide-spread language that allows developers to build complex so...
Abstract. The ability to offload functionality to a programmable network interface is appealing, both for increasing message passing performance and for reducing the overhead on t...