Anomaly localization in large-scale clusters

14 years 6 months ago

Download papers.cluster2007.org

— A critical problem facing by managing large-scale clusters is to identify the location of problems in a system in case of unusual events. As the scale of high performance computing (HPC) grows, systems are getting bigger. When a system fails to function properly, health-related data are collected for troubleshooting. However, due to the massive quantities of information obtained from a large number of components, the root causes of anomalies are often buried like needles in a haystack. In this paper, we present a localization method to automatically ﬁnd out the potential root causes (i.e. a subset of nodes) of the problem from the overwhelming amount of data collected system-wide. System managers can focus on examining these potential locations, thereby signiﬁcantly reducing human efforts required for anomaly localization. Our method consists of three interrelated steps: (1) feature collection to assemble a feature space for the system; (2) feature extraction to obtain the most...

Ziming Zheng, Yawei Li, Zhiling Lan

Real-time Traffic

Cell-based Detection Algorithm | CLUSTER 2007 | Cluster Computing | Critical Problem | Large-scale Clusters |

claim paper

Post Info
More Details (n/a)

Added	02 Jun 2010
Updated	02 Jun 2010
Type	Conference
Year	2007
Where	CLUSTER
Authors	Ziming Zheng, Yawei Li, Zhiling Lan

Comments (0)

Sciweavers

Anomaly localization in large-scale clusters

Cell-based Detection Algorithm | CLUSTER 2007 | Cluster Computing | Critical Problem | Large-scale Clusters |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers