Finite difference methods continue to provide an important and parallelisable approach to many numerical simulations problems. Iterative multigrid and multilevel algorithms can converge faster than ordinary finite difference methods but can be more difficult to parallelise. Data parallel paradigms tend to lend themselves particularly well to solving regular mesh PDEs whereby low latency communications and high compute to communications ratios can yield high levels of computational efficiency and raw performance. We report on some practical algorithmic and data layout approaches and on performance data on a range of Graphical Processing Units (GPUs) with CUDA. We focus on the use of multiple GPU devices with a single CPU host.
Daniel P. Playne, Kenneth A. Hawick