Sciweavers

11 search results - page 2 / 3
» A Meta-Learning Failure Predictor for Blue Gene L Systems
Sort
View
ICPPW
2008
IEEE
14 years 2 months ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...
DSN
2009
IEEE
14 years 2 months ago
System log pre-processing to improve failure prediction
Log preprocessing, a process applied on the raw log before applying a predictive method, is of paramount importance to failure prediction and diagnosis. While existing filtering ...
Ziming Zheng, Zhiling Lan, Byung-Hoon Park, Al Gei...
ICCS
2005
Springer
14 years 1 months ago
Super-Scalable Algorithms for Computing on 100, 000 Processors
In the next five years, the number of processors in high-end systems for scientific computing is expected to rise to tens and even hundreds of thousands. For example, the IBM Blu...
Christian Engelmann, Al Geist
IPPS
2010
IEEE
13 years 4 months ago
Scalable parallel I/O alternatives for massively parallel partitioned solver systems
Abstract--With the development of high-performance computing, I/O issues have become the bottleneck for many massively parallel applications. This paper investigates scalable paral...
Jing Fu, Ning Liu, Onkar Sahni, Kenneth E. Jansen,...
DSN
2007
IEEE
14 years 2 months ago
What Supercomputers Say: A Study of Five System Logs
If we hope to automatically detect and diagnose failures in large-scale computer systems, we must study real deployed systems and the data they generate. Progress has been hampere...
Adam J. Oliner, Jon Stearley