Sciweavers

207 search results - page 27 / 42
» High accuracy failure injection in parallel and distributed ...
Sort
View
PODC
2011
ACM
12 years 10 months ago
Robust network supercomputing without centralized control
Internet supercomputing is becoming an increasingly popular means for harnessing the power of a vast number of interconnected computers. This comes at a cost substantially lower t...
Seda Davtyan, Kishori M. Konwar, Alexander A. Shva...
IPPS
1998
IEEE
13 years 12 months ago
Fault-Tolerant Switched Local Area Networks
The RAIN (Reliable Array of Independent Nodes) project at Caltech is focusing on creating highly reliable distributed systems by leveraging commercially available personal compute...
Paul S. LeMahieu, Vasken Bohossian, Jehoshua Bruck
NPC
2004
Springer
14 years 1 months ago
A Fully Adaptive Fault-Tolerant Routing Methodology Based on Intermediate Nodes
Massively parallel computing systems are being built with thousands of nodes. Because of the high number of components, it is critical to keep these systems running even in the pre...
Nils Agne Nordbotten, María Engracia G&oacu...
IPPS
2008
IEEE
14 years 2 months ago
A plug-and-play model for evaluating wavefront computations on parallel architectures
This paper develops a plug-and-play reusable LogGP model that can be used to predict the runtime and scaling behavior of different MPI-based pipelined wavefront applications runni...
Gihan R. Mudalige, Mary K. Vernon, Stephen A. Jarv...
PODC
2005
ACM
14 years 1 months ago
Routing complexity of faulty networks
One of the fundamental problems in distributed computing is how to efficiently perform routing in a faulty network in which each link fails with some probability. This paper inves...
Omer Angel, Itai Benjamini, Eran Ofek, Udi Wieder