Sciweavers

799 search results - page 82 / 160
» On Failures and Faults
Sort
View
CLUSTER
2006
IEEE
13 years 8 months ago
Autonomous recovery in componentized Internet applications
In this paper we show how to reduce downtime of J2EE applications by rapidly and automatically recovering from transient and intermittent software failures, without requiring appl...
George Candea, Emre Kiciman, Shinichi Kawamoto, Ar...
ICDCS
2012
IEEE
11 years 11 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
ISCA
2005
IEEE
79views Hardware» more  ISCA 2005»
14 years 2 months ago
Design and Evaluation of Hybrid Fault-Detection Systems
As chip densities and clock rates increase, processors are becoming more susceptible to transient faults that can affect program correctness. Up to now, system designers have prim...
George A. Reis, Jonathan Chang, Neil Vachharajani,...
CSUR
2004
132views more  CSUR 2004»
13 years 8 months ago
Approaches to fault-tolerant and transactional mobile agent execution---an algorithmic view
Over the past years, mobile agent technology has attracted considerable attention, and a significant body of literature has been published. To further develop mobile agent technol...
Stefan Pleisch, André Schiper
ECBS
2005
IEEE
179views Hardware» more  ECBS 2005»
14 years 2 months ago
Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems
This paper describes a comprehensive prototype of large-scale fault adaptive embedded software developed for the proposed Fermilab BTeV high energy physics experiment. Lightweight...
Derek Messie, Mina Jung, Jae C. Oh, Shweta Shetty,...