Practical Fault-Tolerant Framework for eScience Infrastructure

16 years 29 days ago

Download dcslab.snu.ac.kr

Many areas of science currently use computing resources as a important part of their research, and many research groups adopt cluster architecture to use them eﬃciently and manage them easily. Therefore, faulttolerance becomes a very important property for the computing resources. However, fault-tolerant systems have not yet been widely adopted because they are either hard to deploy, hard to use, hard to manage, hard to maintain, or hard to justify. This paper proposes a practical fault-tolerant system for eScience infrastructures. Our system uses checkpoint/restart mechanism for fault-tolerance, and provides a easy mechanism to integrate with Grid services widely used in eScience. Additionally, we run rigorous tests using scientiﬁc applications to verify that our system can be used in clusters. We also describe improvements made to our system to solve various problems that arose when deploying it on a cluster. The experimental results show that not only does our system conform to...

Hyuck Han, Jai Wug Kim, Jongpil Lee, Youngjin Yu,

Real-time Traffic

Computing Resources | ESCIENCE 2006 | Fault-tolerant System | Groups Adopt Cluster |

claim paper

» EScience in the classroom Towards viability

» A Computational Framework for Certificate Policy Operations

Post Info
More Details (n/a)

Added	11 Jun 2010
Updated	11 Jun 2010
Type	Conference
Year	2006
Where	ESCIENCE
Authors	Hyuck Han, Jai Wug Kim, Jongpil Lee, Youngjin Yu, Kiyoung Kim, Heon Young Yeom

Comments (0)

Sciweavers

Practical Fault-Tolerant Framework for eScience Infrastructure

Computing Resources | ESCIENCE 2006 | Fault-tolerant System | Groups Adopt Cluster |

Explore & Download

Productivity Tools

Sciweavers