Sciweavers

IPPS
1998
IEEE

Failure Recovery for Distributed Processes in Single System Image Clusters

14 years 3 months ago
Failure Recovery for Distributed Processes in Single System Image Clusters
Single System Image (SSI) Distributed Operating Systems have been the subject of increasing interest in recent years. This interest has been fueled primarily by the trend towards hardware designs that address the scalability problems of traditional Symmetric Multiprocessor (SMP) architectures. These architectures run the gamut between inexpensive compute nodes connected by high-speed interconnects and architectures in which some or all memory is shared between nodes. As machines scale to large numbers of nodes, it becomes increasingly intolerable to allow the failure of any one single node to bring down an entire system. Handling failures can dramatically improve the overall system reliability and availability. Amongst the various components of a distributed operating system, the distributed processing component provides significant failure recovery challenges. This is owing to the large number of relationships processes can participate in and the potential for process state to be dis...
Jeffrey Zabarsky
Added 05 Aug 2010
Updated 05 Aug 2010
Type Conference
Year 1998
Where IPPS
Authors Jeffrey Zabarsky
Comments (0)