Sciweavers

442 search results - page 60 / 89
» Fault Tolerant Wide-Area Parallel Computing
Sort
View
PVM
2009
Springer
14 years 2 months ago
VolpexMPI: An MPI Library for Execution of Parallel Applications on Volatile Nodes
The objective of this research is to convert ordinary idle PCs into virtual clusters for executing parallel applications. The paper introduces VolpexMPI that is designed to enable ...
Troy LeBlanc, Rakhi Anand, Edgar Gabriel, Jaspal S...
HIPC
2009
Springer
13 years 5 months ago
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture
Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Ga...
CCGRID
2006
IEEE
14 years 1 months ago
Closing Cluster Attack Windows Through Server Redundancy and Rotations
— It is well-understood that increasing redundancy in a system generally improves the availability and dependability of the system. In server clusters, one important form of redu...
Yih Huang, David Arsenault, Arun Sood
HPDC
2010
IEEE
13 years 8 months ago
ROARS: a scalable repository for data intensive scientific computing
As scientific research becomes more data intensive, there is an increasing need for scalable, reliable, and high performance storage systems. Such data repositories must provide b...
Hoang Bui, Peter Bui, Patrick J. Flynn, Douglas Th...
ICDCS
2008
IEEE
14 years 2 months ago
stdchk: A Checkpoint Storage System for Desktop Grid Computing
— Checkpointing is an indispensable technique to provide fault tolerance for long-running high-throughput applications like those running on desktop grids. This paper argues that...
Samer Al-Kiswany, Matei Ripeanu, Sudharshan S. Vaz...