Sciweavers

2400 search results - page 16 / 480
» Systems Failures
Sort
View
CCGRID
2006
IEEE
14 years 1 months ago
A Failure-Aware Scheduling Strategy in Large-Scale Cluster System
As the scale is expanding, node failure becomes a commonplace feature of large-scale cluster systems. As an important part of cluster operating system software, job scheduling tak...
Linping Wu, Dan Meng, Jianfeng Zhan, Wang Lei, Bib...
CF
2009
ACM
13 years 5 months ago
High accuracy failure injection in parallel and distributed systems using virtualization
Emulation sits between simulation and experimentation to complete the set of tools available for software designers to evaluate their software and predict behavior under condition...
Thomas Hérault, Thomas Largillier, Sylvain ...
KDD
2005
ACM
178views Data Mining» more  KDD 2005»
14 years 1 months ago
Failure detection and localization in component based systems by online tracking
The increasing complexity of today’s systems makes fast and accurate failure detection essential for their use in mission-critical applications. Various monitoring methods provi...
Haifeng Chen, Guofei Jiang, Cristian Ungureanu, Ke...
ICPPW
2008
IEEE
14 years 2 months ago
Simulating Failures on Large-Scale Systems
—Developing fault management mechanisms is a difficult task because of the unpredictable nature of failures. In this paper, we present a fault simulation framework for Blue Gene...
Narayan Desai, Ewing L. Lusk, Daniel Buettner, And...
SBACPAD
2005
IEEE
111views Hardware» more  SBACPAD 2005»
14 years 1 months ago
VRM: A Failure-Aware Grid Resource Management System
Abstract— For resource management in Grid environments, advance reservations turned out to be very useful and hence are supported by a variety of Grid toolkits. However, failure ...
Lars-Olof Burchard, César A. F. De Rose, Ha...