Left unchecked, the fundamental drive to increase peak performance using tens of thousands of power hungry components will lead to intolerable operating costs and failure rates. R...
The soundness of device drivers generally cannot be verified in isolation, but has to take into account the reactions of the hardware devices. In critical embedded systems, interf...
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, o...
With technology scaling, manufacture-time and in-field permanent faults are becoming a fundamental problem. Multi-core architectures with spares can tolerate them by detecting an...
Shuou Nomura, Matthew D. Sinclair, Chen-Han Ho, Ve...
Windows Error Reporting (WER) is a distributed system that automates the processing of error reports coming from an installed base of a billion machines. WER has collected billion...
Kirk Glerum, Kinshuman Kinshumann, Steve Greenberg...