Sciweavers

1058 search results - page 19 / 212
» Fault-Tolerant Resource Reasoning
Sort
View
EMSOFT
2007
Springer
14 years 4 months ago
A dynamic scheduling approach to designing flexible safety-critical systems
The design of safety-critical systems has typically adopted static techniques to simplify error detection and fault tolerance. However, economic pressure to reduce costs is exposi...
Luís Almeida, Sebastian Fischmeister, Madhu...
ICDCS
2012
IEEE
12 years 7 days ago
Combining Partial Redundancy and Checkpointing for HPC
Today’s largest High Performance Computing (HPC) systems exceed one Petaflops (1015 floating point operations per second) and exascale systems are projected within seven years...
James Elliott, Kishor Kharbas, David Fiala, Frank ...
CLUSTER
2002
IEEE
13 years 9 months ago
Condor-G: A Computation Management Agent for Multi-Institutional Grids
In recent years, there has been a dramatic increase in the amount of available computing and storage resources. Yet few have been able to exploit these resources in an aggregated ...
James Frey, Todd Tannenbaum, Miron Livny, Ian T. F...
ICPADS
2002
IEEE
14 years 2 months ago
Sago: A Network Resource Management System for Real-Time Content Distribution
Abstract— Content replication and distribution is an effective technology to reduce the response time for web accesses and has been proven quite popular among large Internet cont...
Tzi-cker Chiueh, Kartik Gopalan, Anindya Neogi, Ch...
CCGRID
2006
IEEE
14 years 1 months ago
IPMI-based Efficient Notification Framework for Large Scale Cluster Computing
The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and even...
Chokchai Leangsuksun, Tirumala Rao, Anand Tikoteka...