Sciweavers

1113 search results - page 17 / 223
» Performance under Failures of DAG-based Parallel Computing
Sort
View
CLOUDCOM
2010
Springer
13 years 5 months ago
Bag-of-Tasks Scheduling under Budget Constraints
Commercial cloud offerings, such as Amazon's EC2, let users allocate compute resources on demand, charging based on reserved time intervals. While this gives great flexibilit...
Ana-Maria Oprescu, Thilo Kielmann
ICPP
2008
IEEE
14 years 1 months ago
Dynamic Meta-Learning for Failure Prediction in Large-Scale Systems: A Case Study
Despite great efforts on the design of ultra-reliable components, the increase of system size and complexity has outpaced the improvement of component reliability. As a result, fa...
Jiexing Gu, Ziming Zheng, Zhiling Lan, John White,...
HPCA
1999
IEEE
13 years 11 months ago
Permutation Development Data Layout (PDDL)
Declustered data organizations in disk arrays (RAIDs) achieve less-intrusive reconstruction of data after a disk failure. We present PDDL, a new data layout for declustered disk a...
Thomas J. E. Schwarz, Jesse Steinberg, Walter A. B...
ICDCS
1997
IEEE
13 years 11 months ago
Supporting Dynamic Space-sharing on Clusters of Non-dedicated Workstations
Clusters of workstations are increasingly being viewed as a cost-e ective alternative to parallel supercomputers. However, resource management and scheduling on workstations clust...
Abdur Chowdhury, Lisa D. Nicklas, Sanjeev Setia, E...
IPPS
2007
IEEE
14 years 1 months ago
Fast Failure Detection in a Process Group
Failure detectors represent a very important building block in distributed applications. The speed and the accuracy of the failure detectors is critical to the performance of the ...
Xinjie Li, Monica Brockmeyer