Sciweavers

113 search results - page 4 / 23
» Improving Scalability and Fault Tolerance in an Application ...
Sort
View
ICNS
2009
IEEE
13 years 8 months ago
HGRID: Fault Tolerant, Log2N Resource Management for Grids
Grid Resource Discovery Service is currently a very important focus of research. We propose a scheme that presents essential characteristics for efficient, self-configuring and fau...
Antonia Gallardo, Kana Sanjeevan, Luis Díaz...
JCP
2006
115views more  JCP 2006»
13 years 10 months ago
Fault Tolerance in a Multi-Layered DRE System: A Case Study
Dynamic resource management is a crucial part of the infrastructure for emerging distributed real-time embedded systems, responsible for keeping mission-critical applications opera...
Paul Rubel, Joseph P. Loyall, Richard E. Schantz, ...
SAINT
2005
IEEE
14 years 3 months ago
Fault-Tolerant Routing for P2P Systems with Unstructured Topology
New application scenarios, such as Internet-scale computations, nomadic networks and mobile systems, require decentralized, scalable and open infrastructures. The peerto-peer (P2P...
Leonardo Mariani
IPPS
2007
IEEE
14 years 4 months ago
The Design and Implementation of Checkpoint/Restart Process Fault Tolerance for Open MPI
To be able to fully exploit ever larger computing platforms, modern HPC applications and system software must be able to tolerate inevitable faults. Historically, MPI implementati...
Joshua Hursey, Jeffrey M. Squyres, Timothy Mattox,...
CCGRID
2006
IEEE
14 years 2 months ago
IPMI-based Efficient Notification Framework for Large Scale Cluster Computing
The demand for an efficient fault tolerance system has led to the development of complex monitoring infrastructure, which in turn has created an overwhelming task of data and even...
Chokchai Leangsuksun, Tirumala Rao, Anand Tikoteka...