Sciweavers

175 search results - page 4 / 35
» Scalable Fault-Tolerant Distributed Shared Memory
Sort
View
ICDCS
2007
IEEE
14 years 27 days ago
Fault Tolerance in Multiprocessor Systems Via Application Cloning
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes
NOCS
2010
IEEE
13 years 4 months ago
Addressing Manufacturing Challenges with Cost-Efficient Fault Tolerant Routing
Abstract--The high-performance computing domain is enriching with the inclusion of Networks-on-chip (NoCs) as a key component of many-core (CMPs or MPSoCs) architectures. NoCs face...
Samuel Rodrigo, Jose Flich, Antoni Roca, Simone Me...
PVM
2010
Springer
13 years 5 months ago
Dodging the Cost of Unavoidable Memory Copies in Message Logging Protocols
Abstract. With the number of computing elements spiraling to hundred of thousands in modern HPC systems, failures are common events. Few applications are nevertheless fault toleran...
George Bosilca, Aurelien Bouteiller, Thomas H&eacu...
CCGRID
2006
IEEE
14 years 19 days ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
DEXAW
2002
IEEE
133views Database» more  DEXAW 2002»
13 years 11 months ago
ESOW: Parallel/Distributed Programming on the Web
This paper presents an environment for supporting parallel/distributed programming using Java with RMI and RMI-IIOP (CORBA). The environment implements the notion of Shared Object...
Denivaldo Lopes, Slimane Hammoudi, Zair Abdelouaha...