Sciweavers

212 search results - page 13 / 43
» Supporting fault tolerance in a data-intensive computing mid...
Sort
View
GPC
2007
Springer
14 years 1 months ago
A Novel Data Grid Coherence Protocol Using Pipeline-Based Aggressive Copy Method
Grid systems are well-known for its high performance computing or large data storage with inexpensive devices. They can be categorized into two major types: computational grid and ...
Reen-Cheng Wang, Su-Ling Wu, Ruay-Shiung Chang
IPPS
2007
IEEE
14 years 1 months ago
The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
CLUSTER
2002
IEEE
14 years 16 days ago
BioOpera: Cluster-Aware Computing
In this paper we present BioOpera, an extensible process support system for cluster-aware computing. It features an intuitive way to specify computations, as well as improved supp...
Win Bausch, Cesare Pautasso, Reto Schaeppi, Gustav...
OTM
2009
Springer
14 years 2 months ago
Evaluating Throughput Stability of Protocols for Distributed Middleware
Communication of large data volumes is a core functionality of distributed systems middleware, namely, for interconnecting components, for distributed computation and for fault tol...
Nuno Carvalho, José P. Oliveira, José...
DSN
2007
IEEE
14 years 1 months ago
A Tunable Add-On Diagnostic Protocol for Time-Triggered Systems
We present a tunable diagnostic protocol for generic time-triggered (TT) systems to detect crash and send/receive omission faults. Compared to existing diagnostic and membership p...
Marco Serafini, Neeraj Suri, Jonny Vinter, Astrit ...