Sciweavers

3886 search results - page 42 / 778
» Implementing Fault-Tolerant Distributed Applications
Sort
View
TROB
2002
120views more  TROB 2002»
13 years 7 months ago
DPAC: an object-oriented distributed and parallel computing framework for manufacturing applications
Parallel and distributed computing infrastructure are increasingly being embraced in the context of manufacturing applications, including real-time scheduling. In this paper, we pr...
N. R. Srinivasa Raghavan, Tanmay Waghmare
HIPC
2009
Springer
13 years 5 months ago
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture
Large scale compute clusters continue to grow to ever-increasing proportions. However, as clusters and applications continue to grow, the Mean Time Between Failures (MTBF) has redu...
Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Ga...
ICDCS
2012
IEEE
11 years 10 months ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
NGC
1998
Springer
171views Communications» more  NGC 1998»
13 years 7 months ago
Programming Languages for Distributed Applications
Much progress has been made in distributed computing in the areas of distribution structure, open computing, fault tolerance, and security. Yet, writing distributed applications r...
Seif Haridi, Peter Van Roy, Per Brand, Christian S...
IPPS
1998
IEEE
13 years 12 months ago
Migration and Rollback Transparency for Arbitrary Distributed Applications in Workstation Clusters
Programmers and users of compute intensive scientific applications often do not want to (or even cannot) code load balancing and fault tolerance into their programs. The PBEAM syst...
Stefan Petri, Matthias Bolz, Horst Langendörf...