Sciweavers

66 search results - page 4 / 14
» The Checkpoint Problem
Sort
View
PODC
1994
ACM
13 years 11 months ago
A Checkpoint Protocol for an Entry Consistent Shared Memory System
Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's executio...
Nuno Neves, Miguel Castro, Paulo Guedes
IPPS
2007
IEEE
14 years 2 months ago
A Fault Tolerance Protocol with Fast Fault Recovery
Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
Sayantan Chakravorty, Laxmikant V. Kalé
HPCA
2005
IEEE
14 years 8 months ago
Checkpointed Early Load Retirement
Long-latency loads are critical in today's processors due to the ever-increasing speed gap with memory. Not only do these loads block the execution of dependent instructions,...
Nevin Kirman, Meyrem Kirman, Mainak Chaudhuri, Jos...
SRDS
2003
IEEE
14 years 1 months ago
Raptor: Integrating Checkpoints and Thread Migration for Cluster Management
distributed shared-memory (SDSM) provides the abstraction necessary to run shared-memory applications on cost-effective parallel platforms such as clusters of workstations. Howeve...
Hazim Shafi, Evan Speight, John K. Bennett
VEE
2006
ACM
126views Virtualization» more  VEE 2006»
14 years 1 months ago
A new approach to real-time checkpointing
The progress towards programming methodologies that simplify the work of the programmer involves automating, whenever possible, activities that are secondary to the main task of d...
Antonio Cunei, Jan Vitek