Sciweavers

207 search results - page 15 / 42
» High accuracy failure injection in parallel and distributed ...
Sort
View
ASPLOS
2009
ACM
16 years 6 months ago
ASSURE: automatic software self-healing using rescue points
Software failures in server applications are a significant problem for preserving system availability. We present ASSURE, a system that introduces rescue points that recover softw...
Stelios Sidiroglou, Oren Laadan, Carlos Perez, Nic...
ICDCS
2012
IEEE
13 years 7 months ago
Securing Virtual Coordinates by Enforcing Physical Laws
—Virtual coordinate systems (VCS) provide accurate estimations of latency between arbitrary hosts on a network, while conducting a small amount of actual measurements and relying...
Jeffrey Seibert, Sheila Becker, Cristina Nita-Rota...
PDPTA
2010
15 years 3 months ago
Collecting Sensor Data for High-Performance Computing: A Case-study
- Many research questions remain open with regard to improving reliability in exascale systems. Among others, statistics-based analysis has been used to find anomalies, to isolate ...
Line C. Pouchard, Jonathan D. Dobson, Stephen W. P...
CONCURRENCY
1998
130views more  CONCURRENCY 1998»
15 years 5 months ago
JPVM: network parallel computing in Java
The JPVM library is a software system for explicit message-passing based distributed memory MIMD parallel programming in Java. The library supports an interface similar to the C a...
Adam Ferrari
IPPS
2005
IEEE
15 years 11 months ago
A Performance Comparison of Tree and Ring Topologies in Distributed Systems
A distributed system is a collection of computers that are connected via a communication network. Distributed systems have become commonplace due to the wide availability of low-c...
Min Huang, Brett Bode