Three protocols for gossip-based failure detection services in large-scale heterogeneous clusters are analyzed and compared. The basic gossip protocol provides a means by which fai...
We have developed Ceph, a distributed file system that provides excellent performance, reliability, and scalability. Ceph maximizes the separation between data and metadata manage...
Sage A. Weil, Scott A. Brandt, Ethan L. Miller, Da...
The memory subsystem accounts for a significant cost and power budget of a computer system. Current DRAM-based main memory systems are starting to hit the power and cost limit. A...
Moinuddin K. Qureshi, Vijayalakshmi Srinivasan, Ju...
Distributed Shared Memory (DSM) combines the scalability of loosely coupled multicomputer systems with the ease of usability of tightly coupled multiprocessors, and allows transpa...
M. Rasit Eskicioglu, T. Anthony Marsland, Weiwu Hu...
—Current leadership-class machines suffer from a significant imbalance between their computational power and their I/O bandwidth. While Moore’s law ensures that the computatio...
Nawab Ali, Philip H. Carns, Kamil Iskra, Dries Kim...