Sciweavers

207 search results - page 9 / 42
» High accuracy failure injection in parallel and distributed ...
Sort
View
IPPS
2002
IEEE
14 years 17 days ago
A Comparative Study of Parallel and Distributed Java Projects for Heterogeneous Systems
During the last few years, the concepts of cluster computing and heterogeneous networked systems have received increasing interest. The popularity of using Java for developing par...
Jameela Al-Jaroodi, Nader Mohamed, Hong Jiang, Dav...
CCGRID
2009
IEEE
14 years 11 days ago
Dynamic Provisioning of Virtual Organization Clusters
Virtual Organization Clusters are systems comprised of virtual machines that provide dedicated computing clusters for each individual Virtual Organization. The design of these clu...
Michael A. Murphy, Brandon Kagey, Michael Fenn, Se...
PPOPP
2005
ACM
14 years 1 months ago
Fault tolerant high performance computing by a coding approach
As the number of processors in today’s high performance computers continues to grow, the mean-time-to-failure of these computers are becoming significantly shorter than the exe...
Zizhong Chen, Graham E. Fagg, Edgar Gabriel, Julie...
HPCA
2009
IEEE
14 years 8 months ago
Accurate microarchitecture-level fault modeling for studying hardware faults
Decreasing hardware reliability is expected to impede the exploitation of increasing integration projected by Moore's Law. There is much ongoing research on efficient fault t...
Man-Lap Li, Pradeep Ramachandran, Ulya R. Karpuzcu...
ICS
2011
Tsinghua U.
12 years 11 months ago
High performance linpack benchmark: a fault tolerant implementation without checkpointing
The probability that a failure will occur before the end of the computation increases as the number of processors used in a high performance computing application increases. For l...
Teresa Davies, Christer Karlsson, Hui Liu, Chong D...