Sciweavers

2400 search results - page 86 / 480
» Systems Failures
Sort
View
PPOPP
2006
ACM
14 years 3 months ago
Fast and transparent recovery for continuous availability of cluster-based servers
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, o...
Rosalia Christodoulopoulou, Kaloian Manassiev, Ang...
PRDC
2007
IEEE
14 years 3 months ago
PAI: A Lightweight Mechanism for Single-Node Memory Recovery in DSM Servers
Several recent studies identify the memory system as the most frequent source of hardware failures in commercial servers. Techniques to protect the memory system from failures mus...
Jangwoo Kim, Jared C. Smolens, Babak Falsafi, Jame...
WDAG
2010
Springer
184views Algorithms» more  WDAG 2010»
13 years 7 months ago
Fast Asynchronous Consensus with Optimal Resilience
Abstract. We give randomized agreement algorithms with constant expected running time in asynchronous systems subject to process failures, where up to a minority of processes may f...
Ittai Abraham, Marcos Kawazoe Aguilera, Dahlia Mal...
SC
2009
ACM
14 years 3 months ago
FALCON: a system for reliable checkpoint recovery in shared grid environments
In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as the performance degradation is tolerable. For gu...
Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenm...
CODES
2010
IEEE
13 years 7 months ago
A task remapping technique for reliable multi-core embedded systems
With the continuous scaling of semiconductor technology, the life-time of circuit is decreasing so that processor failure becomes an important issue in MPSoC design. A software so...
Chanhee Lee, Hokeun Kim, Hae-woo Park, Sungchan Ki...