Sciweavers

GRID
2006
Springer

Implementation of Fault-Tolerant GridRPC Applications

13 years 11 months ago
Implementation of Fault-Tolerant GridRPC Applications
In this paper, a task parallel application is implemented with Ninf-G which is a GridRPC system, and experimented on, using the Grid testbed in Asia Pacific, for three months. The application is programmed to run for a long time and typical fault patterns were gathered through tens of long executions. As a result, unstable network throughput was determined to be one of the biggest reasons for faults. Then, an important point for application developers is stressed, reminding them to avoid serious decline of task throughput during operations for faults, by timeout minimization for fault detection, background recovery and duplicate task assignments. This study also issues a steer for design of the automated fault-tolerant mechanism in a higher layer of the GridRPC framework.
Yusuke Tanimura, Tsutomu Ikegami, Hidemoto Nakada,
Added 12 Dec 2010
Updated 12 Dec 2010
Type Journal
Year 2006
Where GRID
Authors Yusuke Tanimura, Tsutomu Ikegami, Hidemoto Nakada, Yoshio Tanaka, Satoshi Sekiguchi
Comments (0)