Sciweavers

153 search results - page 4 / 31
» Supporting fault-tolerance for time-critical events in distr...
Sort
View
DEXAW
2005
IEEE
356views Database» more  DEXAW 2005»
14 years 1 months ago
Grid Visualizer: A Monitoring Tool for Grid Environment
One specific problem in wide-area distributed computing environment is effective management of the vast amount of resources that are made available within the grid environment. Th...
Ghazala Shaheen, Muhammad Usman Malik, Zohair Ihsa...
IPPS
2006
IEEE
14 years 1 months ago
Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources
As the desire of scientists to perform ever larger computations drives the size of today’s high performance computers from hundreds, to thousands, and even tens of thousands of ...
Zizhong Chen, Jack Dongarra
ISPA
2007
Springer
14 years 1 months ago
Binomial Graph: A Scalable and Fault-Tolerant Logical Network Topology
The number of processors embedded in high performance computing platforms is growing daily to solve larger and more complex problems. The logical network topologies must also suppo...
Thara Angskun, George Bosilca, Jack Dongarra
ICDCS
1995
IEEE
13 years 11 months ago
Newtop: A Fault-Tolerant Group Communication Protocol
: A general purpose group communication protocol suite called Newtop is described. It is assumed that processes can simultaneously belong to many groups, group size could be large,...
Paul D. Ezhilchelvan, Raimundo A. Macêdo, Sa...
ICDCS
2007
IEEE
14 years 1 months ago
Fault Tolerance in Multiprocessor Systems Via Application Cloning
Record and Replay (RR) is a software based state replication solution designed to support recording and subsequent replay of the execution of unmodified applications running on mu...
Philippe Bergheaud, Dinesh Subhraveti, Marc Vertes