Sciweavers

21 search results - page 3 / 5
» Job-Site Level Fault Tolerance for Cluster and Grid environm...
Sort
View
SC
2005
ACM
14 years 29 days ago
GLARE: A Grid Activity Registration, Deployment and Provisioning Framework
Resource management is a key concern for implementing effective Grid middleware and shielding application developers from low level details. Existing resource managers concentrat...
Mumtaz Siddiqui, Alex Villazón, Jürgen...
MSS
2007
IEEE
82views Hardware» more  MSS 2007»
14 years 1 months ago
Tornado Codes for MAID Archival Storage
This paper examines the application of Tornado Codes, a class of low density parity check (LDPC) erasure codes, to archival storage systems based on massive arrays of idle disks (...
Matthew Woitaszek, Henry M. Tufo
IPPS
2006
IEEE
14 years 1 months ago
Coordinated checkpoint from message payload in pessimistic sender-based message logging
Execution of MPI applications on Clusters and Grid deployments suffers from node and network failure that motivates the use of fault tolerant MPI implementations. Two category tec...
M. Aminian, Mohammad K. Akbari, Bahman Javadi
CCGRID
2009
IEEE
14 years 2 months ago
BLAST Application with Data-Aware Desktop Grid Middleware
—There exists numerous Grid middleware to develop and execute programs on the computational Grid, but they still require intensive work from their users. BitDew is made to facili...
Haiwu He, Gilles Fedak, Bing Tang, Franck Cappello
HPDC
2009
IEEE
14 years 2 months ago
Interconnect agnostic checkpoint/restart in open MPI
Long running High Performance Computing (HPC) applications at scale must be able to tolerate inevitable faults if they are to harness current and future HPC systems. Message Passi...
Joshua Hursey, Timothy Mattox, Andrew Lumsdaine