We consider the problem of dependable computation with multiple inputs. The goal is to study when redundancy can help to achieve survivability and when it cannot. We use AND/OR gra...
Parallel computing on volatile distributed resources requires schedulers that consider job and resource characteristics. We study unconventional computing environments containing ...
Brent Rood, Nathan Gnanasambandam, Michael J. Lewi...
Enterprises may use a service-oriented architecture (SOA) to provide a streamlined interface to their business processes. To scale up the system, each tier in a composite service ...
Large-scale parallel computing is relying increasingly on clusters with thousands of processors. At such large counts of compute nodes, faults are becoming common place. Current t...
Arun Babu Nagarajan, Frank Mueller, Christian Enge...
The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and even...