Sciweavers

SRDS
2010
IEEE

Invariants Based Failure Diagnosis in Distributed Computing Systems

13 years 9 months ago
Invariants Based Failure Diagnosis in Distributed Computing Systems
This paper presents an instance based approach to diagnosing failures in computing systems. Owing to the fact that a large portion of occurred failures are repeated ones, our method takes advantage of past experiences by storing historical failures in a database and retrieving similar instances in the occurrence of failure. We extract the system `invariants' by modeling consistent dependencies between system attributes during the operation, and construct a network graph based on the learned invariants. When a failure happens, the status of invariants network, i.e., whether each invariant link is broken or not, provides a view of failure characteristics. We use a high dimensional binary vector to store those failure evidences, and develop a novel algorithm to efficiently retrieve failure signatures from the database. Experimental results in a web based system have demonstrated the effectiveness of our method in diagnosing the injected failures.
Haifeng Chen, Guofei Jiang, Kenji Yoshihira, Akhil
Added 15 Feb 2011
Updated 15 Feb 2011
Type Journal
Year 2010
Where SRDS
Authors Haifeng Chen, Guofei Jiang, Kenji Yoshihira, Akhilesh Saxena
Comments (0)