Sciweavers

146 search results - page 2 / 30
» Transparent Checkpoint-Restart of Distributed Applications o...
Sort
View
HIPC
2009
Springer
13 years 7 months ago
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture
Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Ga...
ICPP
2009
IEEE
14 years 4 months ago
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems
—Clusters and applications continue to grow in size while their mean time between failure (MTBF) is getting smaller. Checkpoint/Restart is becoming increasingly important for lar...
Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabales...
PVM
2005
Springer
14 years 3 months ago
Scalable Fault Tolerant MPI: Extending the Recovery Algorithm
ct Fault Tolerant MPI (FT-MPI)[6] was designed as a solution to allow applications different methods to handle process failures beyond simple check-point restart schemes. The init...
Graham E. Fagg, Thara Angskun, George Bosilca, Jel...
IPPS
2005
IEEE
14 years 3 months ago
User Transparent Parallel Processing of the 2004 NIST TRECVID Data Set
The Parallel-Horus framework, developed at the University of Amsterdam, is a unique software architecture that allows non-expert parallel programmers to develop fully sequential m...
Frank J. Seinstra, Cees Snoek, Dennis Koelma, Jan-...
PPOPP
2006
ACM
14 years 3 months ago
Fast and transparent recovery for continuous availability of cluster-based servers
Recently there has been renewed interest in building reliable servers that support continuous application operation. Besides maintaining system state consistent after a failure, o...
Rosalia Christodoulopoulou, Kaloian Manassiev, Ang...