Sciweavers

482 search results - page 24 / 97
» A large-scale study of failures in high-performance computin...
Sort
View
DSN
2003
IEEE
15 years 8 months ago
Comparison of Failure Detectors and Group Membership: Performance Study of Two Atomic Broadcast Algorithms
Protocols that solve agreement problems are essential building blocks for fault tolerant distributed systems. While many protocols have been published, little has been done to ana...
Péter Urbán, Ilya Shnayderman, Andr&...
IJHPCA
2006
99views more  IJHPCA 2006»
15 years 3 months ago
A Pragmatic Analysis Of Scheduling Environments On New Computing Platforms
Today, large scale parallel systems are available at relatively low cost. Many powerful such systems have been installed all over the world and the number of users is always incre...
Lionel Eyraud
CCGRID
2008
IEEE
15 years 5 months ago
Fault Tolerance in Cluster Federations with O2P-CF
Fault tolerance is one of the key issues for large scale applications executed on high performance computing systems. In a cluster federation, clusters are gathered to provide hug...
Thomas Ropars, Christine Morin
177
Voted
ATAL
2005
Springer
15 years 9 months ago
Task inference and distributed task management in the Centibots robotic system
We describe a very large scale distributed robotic system, involving a team of over 100 robots, that has been successfully deployed in large, unknown indoor environments, over ext...
Charlie Ortiz, Régis Vincent, Benoit Moriss...
177
Voted
ISPDC
2008
IEEE
15 years 9 months ago
Performance Analysis of Grid DAG Scheduling Algorithms using MONARC Simulation Tool
This paper presents a new approach for analyzing the performance of grid scheduling algorithms for tasks with dependencies. Finding the optimal procedures for DAG scheduling in Gr...
Florin Pop, Ciprian Dobre, Valentin Cristea