—Large-scale GPU clusters are gaining popularity in the scientific computing community. However, their deployment and production use are associated with a number of new challenges. In this paper, we present our efforts to address some of the challenges with building and running GPU clusters in HPC environments. We touch upon such issues as balanced cluster architecture, resource sharing in a cluster environment, programming models, and applications for GPU clusters.
Volodymyr V. Kindratenko, Jeremy Enos, Guochun Shi