In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
Grid is a computational environment in which applications can use multiple distributed computational resources in a safe, coordinated, efficient and transparent way. Data Integra...
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is important for distributed and parallel applications executed in heterogeneous grids. I...
Cloud computing [22] is a promising paradigm for the provisioning of IT services. Cloud computing infrastructures, such as those offered by the RESERVOIR project, aim to facilitat...
Cloud computing has the potential for tremendous benefits, but wide scale adoption has a range of challenges that must be met. We review these challenges and how they relate to sc...
The divide-and-conquer pattern of parallelism is a powerful approach to organize parallelism on problems that are expressed naturally in a recursive way. In fact, recent tools such...
Abstract--This paper examines the initial parallel implementation of SCATTER, a computationally intensive inelastic neutron scattering routine with polycrystalline averaging capabi...
Managing the execution of scientific applications in a heterogeneous grid computing environment can be a daunting task, particularly for long running jobs. Increasing fault tolera...
Abstract. Limited bandwidth to off-chip main memory is a performance bottleneck in chip multiprocessors for streaming computations, such as Cell/B.E., and this will become even mor...