Sciweavers

90 search results - page 3 / 18
» System log pre-processing to improve failure prediction
Sort
View
CCGRID
2006
IEEE
14 years 1 months ago
Exploit Failure Prediction for Adaptive Fault-Tolerance in Cluster Computing
As the scale of cluster computing grows, it is becoming hard for long-running applications to complete without facing failures on large-scale clusters. To address this issue, chec...
Yawei Li, Zhiling Lan
DSN
2005
IEEE
14 years 1 months ago
Probabilistic QoS Guarantees for Supercomputing Systems
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the ...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo, ...
ICPP
2007
IEEE
14 years 1 months ago
A Meta-Learning Failure Predictor for Blue Gene/L Systems
The demand for more computational power in science and engineering has spurred the design and deployment of ever-growing cluster systems. Even though the individual components use...
Prashasta Gujrati, Yawei Li, Zhiling Lan, Rajeev T...
ICPP
2008
IEEE
14 years 2 months ago
Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study
Despite great efforts on the design of ultra-reliable components, the increase of system size and complexity has outpaced the improvement of component reliability. As a result, fa...
Jiexing Gu, Ziming Zheng, Zhiling Lan, John White,...
DSN
2006
IEEE
14 years 1 months ago
BlueGene/L Failure Analysis and Prediction Models
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s BlueGene/L which can acc...
Yinglung Liang, Yanyong Zhang, Anand Sivasubramani...