Many existing clusters use inexpensive Gigabit Ethernet and often have multiple interfaces cards to improve bandwidth and enhance fault tolerance. We investigate the use of Concurr...
Brad Penoff, Mike Tsai, Janardhan R. Iyengar, Alan...
Robustness, through fault tolerance, is a property often put forward in order to advocate MAS. The question is: What is the first step to be fault tolerant? Obviously the answer i...
Katia Potiron, Patrick Taillibert, Amal El Fallah-...
The design of safety-critical systems has typically adopted static techniques to simplify error detection and fault tolerance. However, economic pressure to reduce costs is exposi...
Constructing logical machines out of collections of physical machines is a well-known technique for improving the robustness and fault tolerance of distributed systems. We present...
Yair Amir, Brian A. Coan, Jonathan Kirsch, John La...
This paper proposes GRIDTS, a grid infrastructure in which the resources select the tasks they execute, on the contrary to traditional infrastructures where schedulers find resou...
This paper examines the application of Tornado Codes, a class of low density parity check (LDPC) erasure codes, to archival storage systems based on massive arrays of idle disks (...
Web services have been pointed as a suitable technology for the development and execution of distributed applications. However, the Web service architecture still lacks facilities...
In this paper we address the problem of reducing the energy consumption in distributed embedded systems associated with time-constraints and equipped with fault-tolerant technique...
Abstract—A Network-on-Chip (NoC) replaces on-chip communication implemented by point-to-point interconnects in a multi-core environment by a set of shared interconnects connected...
Scheduling tasks on large-scale computational grids is difficult due to the heterogeneous computational capabilities of the resources, node unavailability and unreliable network ...