Search Sciweavers | Sciweavers

147 search results - page 8 / 30

» Automatic recovery from software failure

176

click to vote

SC
2009
ACM

254views Applied Computing» more SC 2009»

FALCON: a system for reliable checkpoint recovery in shared grid environments

16 years 11 days ago

Download cobweb.ecn.purdue.edu

In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as the performance degradation is tolerable. For gu...

Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenm...

claim paper

Read More »

135

Voted

SERP
2003

126views Software Engineering» more SERP 2003»

Performance of Service-Discovery Architectures in Response to Node Failures

15 years 7 months ago

Download www.itl.nist.gov

Current trends suggest future software systems will rely on service-discovery protocols to combine and recombine distributed services dynamically in reaction to changing condition...

Christopher Dabrowski, Kevin L. Mills, Andrew L. R...

claim paper

Read More »

156

click to vote

SOSP
2003
ACM

138views Operating System» more SOSP 2003»

Improving the reliability of commodity operating systems

16 years 2 months ago

Download nooks.cs.washington.edu

Despite decades of research in extensible operating system technology, extensions such as device drivers remain a significant cause of system failures. In Windows XP, for example,...

Michael M. Swift, Brian N. Bershad, Henry M. Levy

claim paper

Read More »

130

click to vote

IPPS
2007
IEEE

102views Distributed And Parallel Com...» more IPPS 2007»

DejaVu: Transparent User-Level Checkpointing, Migration, and Recovery for Distributed Systems

15 years 12 months ago

Download www.cecs.uci.edu

In this paper, we present a new fault tolerance system called DejaVu for transparent and automatic checkpointing, migration, and recovery of parallel and distributed applications....

Joseph F. Ruscio, Michael A. Heffner, Srinidhi Var...

claim paper

Read More »

133

click to vote

DSN
2002
IEEE

77views Computer Networks» more DSN 2002»

Reducing Recovery Time in a Small Recursively Restartable System

15 years 10 months ago

Download roc.cs.berkeley.edu

We present ideas on how to structure software systems for high availability by considering MTTR/MTTF characteristics of components in addition to the traditional criteria, such as...

George Candea, James Cutler, Armando Fox, Rushabh ...

claim paper

Read More »

« Prev « First page 8 / 30 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers