In this paper, we describe the design and implementation of two mechanisms for fault-tolerance and recovery for complex scientific workflows on computational grids. We present our ...
In the current approaches to workflow scheduling, there is no cooperation between the distributed workflow brokers and as a result, the problem of conflicting schedules occur. To o...
Abstract--The current trend in high performance computing is to aggregate ever larger numbers of processing and interconnection elements in order to achieve desired levels of compu...
Jim M. Brandt, Bert J. Debusschere, Ann C. Gentile...
This paper examines the job exchange between parallel compute sites in a decentralized Grid scenario. Here, the local scheduling system remains untouched and continues normal oper...
Christian Grimme, Joachim Lepping, Alexander Papas...
Production grids are complex and highly variable systems whose behavior is not well understood and difficult to anticipate. The goal of this study is to estimate the impact of the ...
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Prefetching is an effective technique for improving file access performance, which can reduce access latency for I/O systems. In distributed storage system, prefetching for metadat...
Lin Lin, Xueming Li, Hong Jiang, Yifeng Zhu, Lei T...
When orchestrating data-centric workflows as are commonly found in the sciences, centralised servers can become a bottleneck to the performance of a workflow; output from service i...
This paper presents the design and implementation of a new file system independent collective I/O optimization based on file views: view-based collective I/O. View-based collective...