Sciweavers

207 search results - page 7 / 42
» High accuracy failure injection in parallel and distributed ...
Sort
View
CLOUD
2010
ACM
15 years 9 months ago
Lithium: virtual machine storage for the cloud
To address the limitations of centralized shared storage for cloud computing, we are building Lithium, a distributed storage system designed specifically for virtualization workl...
Jacob Gorm Hansen, Eric Jul
EUROPAR
2005
Springer
15 years 10 months ago
Faults in Large Distributed Systems and What We Can Do About Them
Scientists are increasingly using large distributed systems built from commodity off-the-shelf components to perform scientific computation. Grid computing has expanded the scale ...
George Kola, Tevfik Kosar, Miron Livny
IPPS
2005
IEEE
15 years 10 months ago
Fault-Tolerant Parallel Applications with Dynamic Parallel Schedules
Commodity computer clusters are often composed of hundreds of computing nodes. These generally off-the-shelf systems are not designed for high reliability. Node failures therefore...
Sebastian Gerlach, Roger D. Hersch
PDPTA
2007
15 years 6 months ago
Python-based Distributed Programming with Trickle
Abstract Trickle is a an extension to the Python programming language that provides explicit but simple mechanisms to write distributed scripts and programs. Trickle links together...
Gregory Benson, Alexey Fedosov
CLUSTER
2006
IEEE
15 years 10 months ago
JOSHUA: Symmetric Active/Active Replication for Highly Available HPC Job and Resource Management
Most of today‘s HPC systems employ a single head node for control, which represents a single point of failure as it interrupts an entire HPC system upon failure. Furthermore, it...
Kai Uhlemann, Christian Engelmann, Stephen L. Scot...