Sciweavers

3PGCIC
2010

Using a Failure History Service for Reliable Grid Node Information

13 years 9 months ago
Using a Failure History Service for Reliable Grid Node Information
The need for reliability in Grid Systems is a difficult challenge which is very important in the context of highly dynamic systems composed of thousands of nodes. Failure management is a key component in the attempt to provide such a reliable environment. This approach is based on the existence of accurate failure information about the nodes in the Grid which is very difficult in large scale systems. This paper proposes a failure history service used to share failure information which is critical to the management of resources in large scale distributed systems, thus improving the overall reliability. This novel service ensures that the information about the current state of a node, as well as its failure history, is as accurate as possible even when facing a large number of node failures. This solution aims to increase the reliability of Grid systems by providing accurate data which can be used to analyze failures over time.
Catalin Leordeanu, Valentin Cristea, Thomas Ropars
Added 09 Feb 2011
Updated 09 Feb 2011
Type Journal
Year 2010
Where 3PGCIC
Authors Catalin Leordeanu, Valentin Cristea, Thomas Ropars, Yvon Jégou, Christine Morin
Comments (0)