Sciweavers

2400 search results - page 116 / 480
» Systems Failures
Sort
View
DSN
2005
IEEE
14 years 2 months ago
Probabilistic QoS Guarantees for Supercomputing Systems
Supercomputing systems must be able to reliably and efficiently complete their assigned workloads, even in the presence of failures. This paper proposes a system that allows the ...
Adam J. Oliner, Larry Rudolph, Ramendra K. Sahoo, ...
PPOPP
2005
ACM
14 years 2 months ago
Fault tolerant high performance computing by a coding approach
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
ICDCS
2002
IEEE
14 years 2 months ago
A Practical Approach for ?Zero? Downtime in an Operational Information System
An Operational Information System (OIS) supports a real-time view of an organization’s information critical to its logistical business operations. A central component of an OIS ...
Ada Gavrilovska, Karsten Schwan, Van Oleson
ICPP
2002
IEEE
14 years 2 months ago
An Efficient Fault-Tolerant Scheduling Algorithm for Real-Time Tasks with Precedence Constraints in Heterogeneous Systems
In this paper, we investigate an efficient off-line scheduling algorithm in which real-time tasks with precedence constraints are executed in a heterogeneous environment. It provi...
Xiao Qin, Hong Jiang, David R. Swanson
CLUSTER
2001
IEEE
14 years 22 days ago
GulfStream - a System for Dynamic Topology Management in Multi-domain Server Farms
This paper describes GulfStream, a scalable distributed software system designed to address the problem of managing the network topology in a multi-domain server farm. In particul...
Sameh A. Fakhouri, Germán S. Goldszmidt, Mi...