Fault tolerance is an important issue for large machines with tens or hundreds of thousands of processors. Checkpoint-based methods, currently used on most machines, rollback all ...
evel of abstraction, we can represent a workflow as a directed graph with operators (or tasks) at the vertices (see Figure 1). Each operator takes inputs from data sources or from ...
In this paper, we present DeployWare to address the deployment of distributed and heterogeneous software systems on large scale infrastructures such as grids. Deployment of softwa...
The SB-PRAM is a massively parallel, uniform memory access (UMA) shared memory computer. The main ideas of the design are multithreading on instruction level, hashing of the addre...
This paper describes the principles of an original adaptive interconnect for a computational cluster. Torus topology (2d or 3d) is used as a basis but nodes are allowed to effecti...