Sciweavers

207 search results - page 7 / 42
» High accuracy failure injection in parallel and distributed ...
Sort
View
CLOUD
2010
ACM
14 years 22 days ago
Lithium: virtual machine storage for the cloud
To address the limitations of centralized shared storage for cloud computing, we are building Lithium, a distributed storage system designed specifically for virtualization workl...
Jacob Gorm Hansen, Eric Jul
EUROPAR
2005
Springer
14 years 1 months ago
Faults in Large Distributed Systems and What We Can Do About Them
Scientists are increasingly using large distributed systems built from commodity off-the-shelf components to perform scientific computation. Grid computing has expanded the scale ...
George Kola, Tevfik Kosar, Miron Livny
IPPS
2005
IEEE
14 years 1 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Sebastian Gerlach, Roger D. Hersch
PDPTA
2007
13 years 9 months ago
Python-based Distributed Programming with Trickle
Abstract Trickle is a an extension to the Python programming language that provides explicit but simple mechanisms to write distributed scripts and programs. Trickle links together...
Gregory Benson, Alexey Fedosov
CLUSTER
2006
IEEE
14 years 1 months ago
JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management
Most of today‘s HPC systems employ a single head node for control, which represents a single point of failure as it interrupts an entire HPC system upon failure. Furthermore, it...
Kai Uhlemann, Christian Engelmann, Stephen L. Scot...