Sciweavers

845 search results - page 136 / 169
» Reliable operating modes for distributed embedded systems
Sort
View
106
Voted
SIGOPS
2008
119views more  SIGOPS 2008»
15 years 3 months ago
Project Kittyhawk: building a global-scale computer: Blue Gene/P as a generic computing platform
This paper describes Project Kittyhawk, an undertaking at IBM Research to explore the construction of a nextgeneration platform capable of hosting many simultaneous web-scale work...
Jonathan Appavoo, Volkmar Uhlig, Amos Waterland
ICS
2004
Tsinghua U.
15 years 8 months ago
Adaptive incremental checkpointing for massively parallel systems
Given the scale of massively parallel systems, occurrence of faults is no longer an exception but a regular event. Periodic checkpointing is becoming increasingly important in the...
Saurabh Agarwal, Rahul Garg, Meeta Sharma Gupta, J...
DSN
2009
IEEE
15 years 10 months ago
Low overhead Soft Error Mitigation techniques for high-performance and aggressive systems
The threat of soft error induced system failure in high performance computing systems has become more prominent, as we adopt ultra-deep submicron process technologies. In this pap...
Naga Durga Prasad Avirneni, Viswanathan Subramania...
TC
2010
14 years 10 months ago
Model-Driven System Capacity Planning under Workload Burstiness
In this paper, we define and study a new class of capacity planning models called MAP queueing networks. MAP queueing networks provide the first analytical methodology to describe ...
Giuliano Casale, Ningfang Mi, Evgenia Smirni
ASPLOS
2009
ACM
16 years 4 months ago
ASSURE: automatic software self-healing using rescue points
Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover softw...
Stelios Sidiroglou, Oren Laadan, Carlos Perez, Nic...