Sciweavers

482 search results - page 47 / 97
» A large-scale study of failures in high-performance computin...
Sort
View
152
Voted
IPPS
2010
IEEE
15 years 1 months ago
PreDatA - preparatory data analytics on peta-scale machines
Peta-scale scientific applications running on High End Computing (HEC) platforms can generate large volumes of data. For high performance storage and in order to be useful to scien...
Fang Zheng, Hasan Abbasi, Ciprian Docan, Jay F. Lo...
123
Voted
HPDC
2006
IEEE
15 years 9 months ago
Peer to peer size estimation in large and dynamic networks: A comparative study
As the size of distributed systems keeps growing, the peer to peer communication paradigm has been identified as the key to scalability. Peer to peer overlay networks are charact...
Erwan Le Merrer, Anne-Marie Kermarrec, Laurent Mas...
134
Voted
ICDCS
2006
IEEE
15 years 9 months ago
Dynamic Access Control in a Content-based Publish/Subscribe System with Delivery Guarantees
Content-based publish/subscribe (pub/sub) is a promising paradigm for building asynchronous distributed applications. In many application scenarios, these systems are required to ...
Yuanyuan Zhao, Daniel C. Sturman
143
Voted
CLUSTER
2003
IEEE
15 years 9 months ago
Coordinated Checkpoint versus Message Log for Fault Tolerant MPI
— Large Clusters, high availability clusters and Grid deployments often suffer from network, node or operating system faults and thus require the use of fault tolerant programmin...
Aurelien Bouteiller, Pierre Lemarinier, Gér...
138
Voted
OPODIS
2003
15 years 5 months ago
Emulating Shared-Memory Do-All Algorithms in Asynchronous Message-Passing Systems
A fundamental problem in distributed computing is performing a set despite failures and delays. Stated abstractly, the problem is to perform N tasks using P failure-prone processor...
Dariusz R. Kowalski, Mariam Momenzadeh, Alexander ...