In this paper we present a multi-GPU parallel volume rendering implemention built using the MapReduce programming model. We give implementation details of the library, including specific optimizations made for our rendering and compositing design. We analyze the theoretical peak performance and bottlenecks for all tasks required and show that our system significantly reduces computation as a bottleneck in the ray-casting phase. We demonstrate that our rendering speeds are adequate for interactive visualization (our system is capable of rendering a 10243 floating-point sampled volume in under one second using 8 GPUs), and that our system is capable of delivering both in-core and out-of-core visualizations. We argue that a multi-GPU MapReduce library is a good fit for parallel volume renderering because it is easy to program for, scales well, and eliminates the need to focus on I/O algorithms thus allowing the focus to be on visualization algorithms instead. We show that our system scal...
Jeff A. Stuart, Cheng-Kai Chen, Kwan-Liu Ma, John