This paper presents a software architecture for hardware fault tolerance based on loosely-synchronized, redundant virtual machines (LSRVM). LSRVM will provide high levels of reliability by tolerating hardware faults at all levels of the system. Historically, such hardware fault tolerance has only been achievable using customdesigned hardware and proprietary operating systems. Today, however, technological trends and economic factors are driving a reduction in the amount of custom-designed hardware. We believe that this path should be followed to its ultimate conclusion: a highlyavailable, fault-tolerant computing system based entirely on commodity hardware and open-source operating systems. Our revolutionary approach utilizes virtualization to efficiently provide redundancy on modern commodity hardware. When combined with existing application-level fault tolerance mechanisms, LSRVM will provide very high levels of reliability at extremely low cost.
Alan L. Cox, Kartik Mohanram, Scott Rixner