Sciweavers

207 search results - page 15 / 42
» High accuracy failure injection in parallel and distributed ...
Sort
View
ASPLOS
2009
ACM
14 years 8 months ago
ASSURE: automatic software self-healing using rescue points
Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover softw...
Stelios Sidiroglou, Oren Laadan, Carlos Perez, Nic...
ICDCS
2012
IEEE
11 years 10 months ago
Securing Virtual Coordinates by Enforcing Physical Laws
—Virtual coordinate systems (VCS) provide accurate estimations of latency between arbitrary hosts on a network, while conducting a small amount of actual measurements and relying...
Jeffrey Seibert, Sheila Becker, Cristina Nita-Rota...
PDPTA
2010
13 years 5 months ago
Collecting Sensor Data for High-Performance Computing: A Case-study
- Many research questions remain open with regard to improving reliability in exascale systems. Among others, statistics-based analysis has been used to find anomalies, to isolate ...
Line C. Pouchard, Jonathan D. Dobson, Stephen W. P...
CONCURRENCY
1998
130views more  CONCURRENCY 1998»
13 years 7 months ago
JPVM: network parallel computing in Java
The JPVM library is a software system for explicit message-passing based distributed memory MIMD parallel programming in Java. The library supports an interface similar to the C a...
Adam Ferrari
IPPS
2005
IEEE
14 years 1 months ago
A Performance Comparison of Tree and Ring Topologies in Distributed Systems
A distributed system is a collection of computers that are connected via a communication network. Distributed systems have become commonplace due to the wide availability of low-c...
Min Huang, Brett Bode