Sciweavers

114 search results - page 16 / 23
» Speculative Parallelization - Eliminating the Overhead of Fa...
Sort
View
HCW
2000
IEEE
14 years 1 days ago
Evaluation of PAMS' Adaptive Management Services
Management of large-scale parallel and distributed applications is an extremely complex task due to factors such as centralized management architectures, lack of coordination and ...
Yoonhee Kim, Salim Hariri, Muhamad Djunaedi
ICDCS
2000
IEEE
14 years 1 days ago
Coherence-based Coordinated Checkpointing for Software Distributed Shared Memory Systems
Fault-tolerant techniques that can cope with system failures in software distributed shared memory (SDSM) are essential for creating productive and highly available parallel compu...
Angkul Kongmunvattana, Santipong Tanchatchawal, Ni...
ICDCS
2012
IEEE
11 years 10 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
PPOPP
2003
ACM
14 years 27 days ago
Automated application-level checkpointing of MPI programs
Because of increasing hardware and software complexity, the running time of many computational science applications is now more than the mean-time-to-failure of highpeformance com...
Greg Bronevetsky, Daniel Marques, Keshav Pingali, ...
PODC
1990
ACM
13 years 11 months ago
Sharing Memory Robustly in Message-Passing Systems
Emulators that translate algorithms from the shared-memory model to two different message-passing models are presented. Both are achieved by implementing a wait-free, atomic, singl...
Hagit Attiya, Amotz Bar-Noy, Danny Dolev