Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significa...
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, ...
Latent sector errors in disk drives affect only a few data sectors. They occur silently and are detected only when the affected area is accessed again. If a latent error is detect...
Ningfang Mi, Alma Riska, Evgenia Smirni, Erik Ried...
In this paper, we utilize a bandwidth-centric job communication model that captures the interaction and impact of simultaneously co-allocating jobs across multiple clusters. We ma...
William M. Jones, Walter B. Ligon III, Nishant Shr...
The Domain Name System (DNS) is a critical part of the Internet’s infrastructure, and is one of the few examples of a robust, highlyscalable, and operational distributed system....
Jeffrey Pang, James Hendricks, Aditya Akella, Robe...
To cope with the explosive increase in the number of requests to Internet server systems, one popular solution is a load-balancing technique that uses a dispatcher in the front-en...