This paper proposes a novel way to use virtual memorymapped communication (VMMC) to reduce the failover time on clusters. With the VMMC model, applications’ virtual address space can be efficiently mirrored on remote memory either automatically or via explicit messages. When a machine fails, its applications can restart from the most recent checkpoints on the failover node with minimal memory copying and disk I/O overhead. This method requires little change to applications’ source code. We developed two fast failover protocols: deliberate update failover protocol (DU) and automatic update failover protocol (AU). The first can run on any system that supports VMMC, whereas the other requires special network interface support. We implemented these two protocols on two different clusters that supported VMMC communication. Our results with three transaction-based applications show that both protocols work quite well. The deliberate update protocol imposes 4-21% overhead when taking c...
Yuanyuan Zhou, Peter M. Chen, Kai Li