Sciweavers

86 search results - page 5 / 18
» Hybrid checkpointing for parallel applications in cluster fe...
Sort
View
SRDS
2003
IEEE
14 years 24 days ago
Raptor: Integrating Checkpoints and Thread Migration for Cluster Management
distributed shared-memory (SDSM) provides the abstraction necessary to run shared-memory applications on cost-effective parallel platforms such as clusters of workstations. Howeve...
Hazim Shafi, Evan Speight, John K. Bennett
ICDCS
2011
IEEE
12 years 7 months ago
Provisioning a Multi-tiered Data Staging Area for Extreme-Scale Machines
—Massively parallel scientific applications, running on extreme-scale supercomputers, produce hundreds of terabytes of data per run, driving the need for storage solutions to im...
Ramya Prabhakar, Sudharshan S. Vazhkudai, Youngjae...
PDPTA
2000
13 years 9 months ago
Dependable High Performance Computing on a Parallel Sysplex Cluster
In this paper we address the issue of dependable distributed high performance computing in the field of Symbolic Computation. We describe the extension of a middleware infrastructu...
Wolfgang Blochinger, Reinhard Bündgen, Andrea...
ICPADS
2007
IEEE
14 years 1 months ago
Federated clusters using the transparent remote Execution (TREx) environment
- Due to the increasing complexity of scientific models, large-scale simulation tools often require a critical amount of computational power to produce results in a reasonable amou...
Richert Wang, Enrique Cauich, Isaac D. Scherson
PODC
1994
ACM
13 years 11 months ago
A Checkpoint Protocol for an Entry Consistent Shared Memory System
Workstation clusters are becoming an interesting alternative to dedicated multiprocessors. In this environment, the probability of a failure, during an application's executio...
Nuno Neves, Miguel Castro, Paulo Guedes