—Continuum quantum Monte Carlo (QMC) has proved to be an invaluable tool for predicting the properties of matter from fundamental principles. By solving the manybody Schr¨odinger equation through a stochastic projection, it achieves greater accuracy than mean-field methods and better scalability than quantum chemical methods, enabling scientific discovery across a broad spectrum of disciplines. The multiple forms of parallelism afforded by QMC algorithms make them ideal candidates for acceleration in the many-core paradigm. We present the results of our effort to port the QMCPACK simulation code to the NVIDIA CUDA GPU platform. We restructure the CPU algorithms to expose additional parallelism, minimize GPU-CPU communication, and efficiently utilize the GPU memory hierarchy. We employ mixed precision on GT200 GPUs and MPI for intercommunication and load balancing. In production-level science runs, we observe typical full-application speedups of approximately 10x to 15x relative t...
Kenneth Esler, Jeongnim Kim, David M. Ceperley, Lu