We discuss research issues and models for vulnerabilities and threats in distributed computing systems. We present four diverse approaches to reducing system vulnerabilities and th...
The paper describes a metaobject architecture for distributed fault tolerant systems. Basically metaobject protocols enables functional objects to be independent from meta-function...
Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
This paper presents a software architecture for hardware fault tolerance based on loosely-synchronized, redundant virtual machines (LSRVM). LSRVM will provide high levels of relia...
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...