Sciweavers

106 search results - page 5 / 22
» Optimizing Performance and Reliability in Distributed Comput...
Sort
View
ICN
2005
Springer
14 years 12 days ago
Load Distribution Performance of the Reliable Server Pooling Framework
Abstract. The Reliable Server Pooling (RSerPool) protocol suite currently under standardization by the IETF is designed to build systems providing highly available services by prov...
Thomas Dreibholz, Erwin P. Rathgeb, Michael Tü...
OSDI
2006
ACM
14 years 7 months ago
Ceph: A Scalable, High-Performance Distributed File System
We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata manage...
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Da...
CLOUDCOM
2010
Springer
13 years 4 months ago
REMEM: REmote MEMory as Checkpointing Storage
Checkpointing is a widely used mechanism for supporting fault tolerance, but notorious in its high-cost disk access. The idea of memory-based checkpointing has been extensively stu...
Hui Jin, Xian-He Sun, Yong Chen, Tao Ke
SC
2009
ACM
14 years 1 months ago
FALCON: a system for reliable checkpoint recovery in shared grid environments
In Fine-Grained Cycle Sharing (FGCS) systems, machine owners voluntarily share their unused CPU cycles with guest jobs, as long as the performance degradation is tolerable. For gu...
Tanzima Zerin Islam, Saurabh Bagchi, Rudolf Eigenm...
SIGCOMM
2005
ACM
14 years 14 days ago
OpenDHT: a public DHT service and its uses
Large-scale distributed systems are hard to deploy, and distributed hash tables (DHTs) are no exception. To lower the barriers facing DHT-based applications, we have created a pub...
Sean C. Rhea, Brighten Godfrey, Brad Karp, John Ku...