: © Maintaining Network QoS Across NIC Device Driver Failures Using Virtualization Michael Le, Andrew Gallagher, Yuval Tamir, Yoshio Turner HP Laboratories HPL-2009-115 device driver, recovery, virtual machine, fault tolerance, QoS, network, dependability, resiliency Device driver failures have been shown to be a major cause of system failures. Network services stress NIC device drivers, increasing the probability of NIC driver bugs being manifested as server failures. System virtualization is increasingly used for server consolidation and management. The isolated driver domain (IDD) architecture used by several virtual machine monitors, such as Xen, forms a natural foundation for making systems resilient to NIC driver failures. In order to realize this potential, recovery must be fast enough to maintain QoS for network services across NIC driver failures. We show that the standard Xen configuration, enhanced with simple detection and recovery mechanisms, cannot provide such QoS. Howe...