Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the pre...
Writing correct distributed programs is hard. In spite of extensive testing and debugging, software faults persist even in commercial grade software. Many distributed systems, esp...
PC grids represent massive computation capacity at a low cost, but are challenging to employ for parallel computing because of variable and unpredictable performance and availabili...
Nagarajan Kanna, Jaspal Subhlok, Edgar Gabriel, Es...
This paper describes the design and implementation of a fault-tolerant CORBA naming service - CosNamingFT. Every CORBA object is accessed through its Interoperable Object Referenc...
Lau Cheuk Lung, Joni da Silva Fraga, Jean-Marie Fa...
Abstract—This paper describes a method to implement faulttolerant services in distributed systems based on the idea of fused state machines. The theory of fused state machines us...