Sciweavers

1256 search results - page 13 / 252
» On Coordinated Checkpointing in Distributed Systems
Sort
View
AAAI
2006
13 years 10 months ago
Behaviosites: Manipulation of Multiagent System Behavior through Parasitic Infection
In this paper we present the Behaviosite Paradigm, a new approach to coordination and control of distributed agents in a multiagent system, inspired by biological parasites with b...
Amit Shabtay, Zinovi Rabinovich, Jeffrey S. Rosens...
HPDC
2009
IEEE
14 years 3 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine
APPINF
2003
13 years 10 months ago
Replication of Checkpoints in Recoverable DSM Systems
This paper presents a new technique of recovery for object-based Distributed Shared Memory (DSM) systems. The new technique, integrated with a coherence protocol for atomic consis...
Jerzy Brzezinski, Michal Szychowiak
PODC
1994
ACM
14 years 26 days ago
A Checkpoint Protocol for an Entry Consistent Shared Memory System
Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's executio...
Nuno Neves, Miguel Castro, Paulo Guedes