Sciweavers

442 search results - page 57 / 89
» Fault Tolerant Wide-Area Parallel Computing
Sort
View
IPPS
1998
IEEE
14 years 1 months ago
A Generalized Forward Recovery Checkpointing Scheme
We propose a generalized forward recovery checkpointing scheme, with lookahead execution and rollback validation. This method takes advantage of voting and comparison on multiple v...
Ke Huang, Jie Wu, Eduardo B. Fernández
GCC
2003
Springer
14 years 2 months ago
Grid Computing for the Masses: An Overview
Abstract. The common goals of the Grid and peer-to-peer communities have brought them in close proximity. Both the technologies overlay a collaborative resource-sharing infrastruct...
Kaizar Amin, Gregor von Laszewski, Armin R. Mikler
HPDC
2008
IEEE
14 years 4 months ago
DataLab: transactional data-parallel computing on an active storage cloud
Active storage clouds are an attractive platform for executing large data intensive workloads found in many fields of science. However, active storage presents new system managem...
Brandon Rich, Douglas Thain
HPDC
2012
IEEE
12 years 4 hour ago
Understanding the effects and implications of compute node related failures in hadoop
Hadoop has become a critical component in today’s cloud environment. Ensuring good performance for Hadoop is paramount for the wide-range of applications built on top of it. In ...
Florin Dinu, T. S. Eugene Ng
HCW
1998
IEEE
14 years 1 months ago
CCS Resource Management in Networked HPC Systems
CCS is a resource management system for parallel high-performance computers. At the user level, CCS provides vendor-independent access to parallel systems. At the system administr...
Axel Keller, Alexander Reinefeld