Sciweavers

32 search results - page 1 / 7
» Performance Implications of Failures in Large-Scale Cluster ...
Sort
View
JSSPP
2004
Springer
14 years 29 days ago
Performance Implications of Failures in Large-Scale Cluster Scheduling
As we continue to evolve into large-scale parallel systems, many of them employing hundreds of computing engines to take on mission-critical roles, it is crucial to design those s...
Yanyong Zhang, Mark S. Squillante, Anand Sivasubra...
IPPS
2005
IEEE
14 years 1 months ago
Performance Implications of Periodic Checkpointing on Large-Scale Cluster Systems
Large-scale systems like BlueGene/L are susceptible to a number of software and hardware failures that can affect system performance. Periodic application checkpointing is a commo...
Adam J. Oliner, Ramendra K. Sahoo, José E. ...
CCGRID
2006
IEEE
14 years 1 months ago
A Failure-Aware Scheduling Strategy in Large-Scale Cluster System
As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling tak...
Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib...
ESCIENCE
2006
IEEE
14 years 1 months ago
Job Failure Analysis and Its Implications in a Large-Scale Production Grid
In this paper we present an initial analysis of job failures in a large-scale data-intensive Grid. Based on three representative periods in production, we characterize the interar...
Hui Li, David L. Groep, Lex Wolters, Jeffrey Templ...
ECRTS
2007
IEEE
14 years 1 months ago
A Hybrid Real-Time Scheduling Approach for Large-Scale Multicore Platforms
We propose a hybrid approach for scheduling real-time tasks on large-scale multicore platforms with hierarchical shared caches. In this approach, a multicore platform is partition...
John M. Calandrino, James H. Anderson, Dan P. Baum...