Sciweavers

334 search results - page 12 / 67
» Fundamentals of Fault-Tolerant Distributed Computing in Asyn...
Sort
View
CCGRID
2010
IEEE
13 years 9 months ago
Selective Recovery from Failures in a Task Parallel Programming Model
Abstract--We present a fault tolerant task pool execution environment that is capable of performing fine-grain selective restart using a lightweight, distributed task completion tr...
James Dinan, Arjun Singri, P. Sadayappan, Sriram K...
ICA3PP
2010
Springer
13 years 8 months ago
Checkpointing and Migration of Communication Channels in Heterogeneous Grid Environments
Abstract. A grid checkpointing service providing migration and transparent fault tolerance is important for distributed and parallel applications executed in heterogeneous grids. I...
John Mehnert-Spahn, Michael Schoettner
ATAL
2005
Springer
14 years 2 months ago
A distributed services based conference planner application using software agents, grid services and web services
This demonstration highlights the applications of our research work i.e. second generation (Scalable Fault Tolerant Agent Grooming Environment – SAGE) Multi Agent System, Integr...
M. Omair Shafiq, Arshad Ali, Amina Tariq, Amna Bas...
CCGRID
2006
IEEE
14 years 2 months ago
Proposal of MPI Operation Level Checkpoint/Rollback and One Implementation
With the increasing number of processors in modern HPC(High Performance Computing) systems, there are two emergent problems to solve. One is scalability, the other is fault tolera...
Yuan Tang, Graham E. Fagg, Jack Dongarra
HIPC
2004
Springer
14 years 1 months ago
Lock-Free Parallel Algorithms: An Experimental Study
Abstract. Lock-free shared data structures in the setting of distributed computing have received a fair amount of attention. Major motivations of lock-free data structures include ...
Guojing Cong, David A. Bader