We describe the design and implementation of a clustering service for a high-performance, shared-disk file system. The service provides failure detection and recovery, reliable e...
As software Distributed Shared Memory(DSM) systems become attractive on larger clusters, the focus of attention moves toward improving the reliability of systems. In this paper, w...
This paper presents general algorithms for concurrency control in tree-based access methods as well as a recovery protocol and a mechanism for ensuring repeatable read. The algori...
Scalable Reliable Multicast Protocols have been the subject of much research in recent years. We propose a new protocol that groups receivers for error recovery into fixed-size gr...
As transistor dimensions continue to scale deep into the nanometer regime, silicon reliability is becoming a chief concern. At the same time, transistor counts are scaling up, ena...
Andrew DeOrio, Konstantinos Aisopos, Valeria Berta...