Sciweavers

482 search results - page 46 / 97
» A large-scale study of failures in high-performance computin...
Sort
View
107
Voted
IPPS
2006
IEEE
15 years 9 months ago
Evaluating cooperative checkpointing for supercomputing systems
Cooperative checkpointing, in which the system dynamically skips checkpoints requested by applications at runtime, can exploit system-level information to improve performance and ...
Adam J. Oliner, Ramendra K. Sahoo
117
Voted
PODC
1998
ACM
15 years 7 months ago
Probabilistic Byzantine Quorum Systems
ÐIn this paper, we explore techniques to detect Byzantine server failures in asynchronous replicated data services. Our goal is to detect arbitrary failures of data servers in a s...
Dahlia Malkhi, Michael K. Reiter, Avishai Wool, Re...
125
Voted
IPPS
2006
IEEE
15 years 9 months ago
Parallel implementation of the replica exchange molecular dynamics algorithm on Blue Gene/L
The Replica Exchange method is a popular approach for studying the folding thermodynamics of small to modest size proteins in explicit solvent, since it is easily parallelized. Ho...
Maria Eleftheriou, Aleksandr Rayshubskiy, Jed W. P...
154
Voted
HPDC
2010
IEEE
15 years 4 months ago
Data parallelism in bioinformatics workflows using Hydra
Large scale bioinformatics experiments are usually composed by a set of data flows generated by a chain of activities (programs or services) that may be modeled as scientific work...
Fábio Coutinho, Eduardo S. Ogasawara, Danie...
107
Voted
AAAI
2004
15 years 5 months ago
Towards Autonomic Computing: Adaptive Job Routing and Scheduling
Computer systems are rapidly becoming so complex that maintaining them with human support staffs will be prohibitively expensive and inefficient. In response, visionaries have beg...
Shimon Whiteson, Peter Stone