Sciweavers

799 search results - page 121 / 160
» On Failures and Faults
Sort
View
HPDC
2009
IEEE
14 years 3 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
OTM
2009
Springer
14 years 3 months ago
Proactive Byzantine Quorum Systems
Byzantine Quorum Systems is a replication technique used to ensure availability and consistency of replicates data even in presence of arbitrary faults. This paper presents a Byzan...
Eduardo Adílio Pelinson Alchieri, Alysson N...
SRDS
2008
IEEE
14 years 3 months ago
Self-Stabilization in Tree-Structured Peer-to-Peer Service Discovery Systems
The efficiency of service discovery is critical in the development of fully decentralized middleware intended to manage large scale computational grids. This demand influenced t...
Eddy Caron, Ajoy Kumar Datta, Franck Petit, C&eacu...
DATE
2007
IEEE
106views Hardware» more  DATE 2007»
14 years 3 months ago
Low-cost protection for SER upsets and silicon defects
Extreme transistor scaling trends in silicon technology are soon to reach a point where manufactured systems will suffer from limited device reliability and severely reduced life...
Mojtaba Mehrara, Mona Attariyan, Smitha Shyam, Kyp...
ICPADS
2002
IEEE
14 years 1 months ago
Self-Stabilizing Wormhole Routing on Ring Networks
Wormhole routing is most common in parallel architectures in which messages are sent in small fragments called flits. It is a lightweight and efficient method of routing message...
Ajoy Kumar Datta, Maria Gradinariu, Anthony B. Ken...