Sciweavers

482 search results - page 38 / 97
» A large-scale study of failures in high-performance computin...
Sort
View
152
Voted
AP2PS
2009
IEEE
15 years 6 months ago
Algorithm-Based Fault Tolerance Applied to P2P Computing Networks
—P2P computing platforms are subject to a wide range of attacks. In this paper, we propose a generalisation of the previous disk-less checkpointing approach for fault-tolerance i...
Thomas Roche, Mathieu Cunche, Jean-Louis Roch
DSN
2000
IEEE
15 years 8 months ago
On the Quality of Service of Failure Detectors
ÐWe study the quality of service (QoS) of failure detectors. By QoS, we mean a specification that quantifies 1) how fast the failure detector detects actual failures and 2) how we...
Wei Chen, Sam Toueg, Marcos Kawazoe Aguilera
129
Voted
DSN
2006
IEEE
15 years 9 months ago
BlueGene/L Failure Analysis and Prediction Models
The growing computational and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as IBM’s BlueGene/L which can acc...
Yinglung Liang, Yanyong Zhang, Anand Sivasubramani...
SIGMETRICS
1994
ACM
15 years 7 months ago
An Empirical Study of a Highly Available File System
In this paper we present results from a six-month empirical study of the high availability aspectsof the CodaFile System. We reporton the servicefailures experienced by Coda clien...
Brian Noble, Mahadev Satyanarayanan
EDBT
2008
ACM
159views Database» more  EDBT 2008»
16 years 3 months ago
P2P systems with transactional semantics
Structured P2P systems have been developed for constructing applications at internet scale in cooperative environments and exhibit a number of desirable features such as scalabili...
Shyam Antony, Divyakant Agrawal, Amr El Abbadi