The need for reliability in Grid Systems is a difficult challenge which is very important in the context of highly dynamic systems composed of thousands of nodes. Failure management is a key component in the attempt to provide such a reliable environment. This approach is based on the existence of accurate failure information about the nodes in the Grid which is very difficult in large scale systems. This paper proposes a failure history service used to share failure information which is critical to the management of resources in large scale distributed systems, thus improving the overall reliability. This novel service ensures that the information about the current state of a node, as well as its failure history, is as accurate as possible even when facing a large number of node failures. This solution aims to increase the reliability of Grid systems by providing accurate data which can be used to analyze failures over time.