Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

164

HPCA
2003
IEEE

113views Distributed And Parallel Com...» more HPCA 2003»

Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters

16 years 6 months ago

Dynamic Data Replication: An Approach to Providing Fault-Tolerant Shared Memory Clusters

Download www.ics.forth.gr

A challenging issue in today's server systems is to transparently deal with failures and application-imposed requirements for continuous operation. In this paper we address this problem in shared virtual memory (SVM) clusters at ramming abstraction layer. We design extensions to an existing SVM protocol that has been tuned for lowlatency, high-bandwidth interconnects and SMP nodes and we achieve reliability through dynamic replication of application shared data and protocol information. Our extensions allow us to tolerate single (or multiple, but not simultaneous) node failures. We implement our extensions on a stateof-the-art cluster and we evaluate the common, failure-free case. We find that, although the complexity of our protocol is substantially higher than its failure-free counterpart, by taking advantage of architectural features of modern systems our approach imposes low overhead and can be employed for transparently dealing with system failures.

Rosalia Christodoulopoulou, Reza Azimi, Angelos Bi

Real-time Traffic

Application Shared Data | Computer Architecture | HPCA 2003 | Node Failures | SVM Protocol |

claim paper

Related Content

» Transparent Fault Tolerance for Parallel Applications on Networks of Workstations

» Data Replication Strategies for Fault Tolerance and Availability on Commodity Clusters

» LowOverhead Protocols for FaultTolerant File Sharing

» FaultTolerant DistributedSharedMemory on a BroadcastBased Interconnection Network

» Osprey Implementing MapReduceStyle Fault Tolerance in a SharedNothing Distributed Database

» Towards Optimal Resource Allocation in PartialFault Tolerant Applications

» Fast and transparent recovery for continuous availability of clusterbased servers

» Proposal of MPI Operation Level CheckpointRollback and One Implementation

» ReCon A Fast and Reliable Replica Retrieval Service for the Data Grid

Post Info
More Details (n/a)

Added	01 Dec 2009
Updated	01 Dec 2009
Type	Conference
Year	2003
Where	HPCA
Authors	Rosalia Christodoulopoulou, Reza Azimi, Angelos Bilas

Comments (0)