Sciweavers

695 search results - page 27 / 139
» Cache based fault recovery for distributed systems
Sort
View
VLDB
1997
ACM
104views Database» more  VLDB 1997»
14 years 17 days ago
Integrating Reliable Memory in Databases
Abstract. Recent results in the Rio project at the University of Michigan show that it is possible to create an area of main memory that is as safe as disk from operating system cr...
Wee Teck Ng, Peter M. Chen
CORR
2008
Springer
134views Education» more  CORR 2008»
13 years 8 months ago
Algorithmic Based Fault Tolerance Applied to High Performance Computing
: We present a new approach to fault tolerance for High Performance Computing system. Our approach is based on a careful adaptation of the Algorithmic Based Fault Tolerance techniq...
George Bosilca, Remi Delmas, Jack Dongarra, Julien...
ICAC
2005
IEEE
14 years 2 months ago
Distributed Troubleshooting Agents
Key issues to address in autonomic job recovery for cluster computing are recognizing job failure; understanding the failure sufficiently to know if and how to restart the job; an...
Charles Earl, Emilio Remolina, Jim Ong, John Brown
IPPS
1999
IEEE
14 years 22 days ago
Dependability Evaluation of Fault Tolerant Distributed Industrial Control Systems
Modern distributed industrial control systems need improvements in their dependability. In this paper we study the dependability of a fault tolerant distributed industrial control ...
José Carlos Campelo, Pedro Yuste, Francisco...
CIS
2005
Springer
14 years 2 months ago
Survivability Computation of Networked Information Systems
Abstract. Survivability should be considered beyond security for networked information systems, which emphasizes the ability of continuing providing services timely in malicious en...
Xuegang Lin, Rongsheng Xu, Miaoliang Zhu