Symmetric Active/Active High Availability for High-Performance Computing System Services

15 years 6 months ago

Download www.csm.ornl.gov

Abstract-- This work aims to pave the way for high availability in high-performance computing (HPC) by focusing on efficient redundancy strategies for head and service nodes. These nodes represent single points of failure and control for an entire HPC system as they render it inaccessible and unmanageable in case of a failure until repair. The presented approach introduces two distinct replication methods, internal and external, for providing symmetric active/active high availability for multiple redundant head and service nodes running in virtual synchrony utilizing an existing process group communication system for service group membership management and reliable, totally ordered message delivery. Resented results of a prototype implementation that offers symmetric active/active replication for HPC job and resource management using external replication show that the highest level of availability can be provided with an acceptable performance trade-off.

Christian Engelmann, Stephen L. Scott, Chokchai Le

Real-time Traffic

Distinct Replication Methods | Efficient Redundancy Strategies | JCP 2006 | Multiple Redundant Head |

claim paper

» Transparent Symmetric ActiveActive Replication for ServiceLevel High Availability

» Event Services for High Performance Computing

» Enabling the P2P JXTA Platform for HighPerformance Networking Grid Infrastructures

» A FaultTolerant Middleware Architecture for HighAvailability Storage Services

» Performance Modeling and Prediction of Nondedicated Network Computing

» Dynamic Access Control in a Contentbased PublishSubscribe System with Delivery Guarantees

» XenBased HPC A Parallel IO Perspective

» Automatic configuration of internet services

Post Info
More Details (n/a)

Added	13 Dec 2010
Updated	13 Dec 2010
Type	Journal
Year	2006
Where	JCP
Authors	Christian Engelmann, Stephen L. Scott, Chokchai Leangsuksun, Xubin (Ben) He

Comments (0)

Sciweavers

Symmetric Active/Active High Availability for High-Performance Computing System Services

Distinct Replication Methods | Efficient Redundancy Strategies | JCP 2006 | Multiple Redundant Head |

Explore & Download

Productivity Tools

Sciweavers