Sciweavers

HIPC
2005
Springer

Performance Study of LU Decomposition on the Programmable GPU

14 years 5 months ago
Performance Study of LU Decomposition on the Programmable GPU
With the increasing programmability of GPUs (graphics processing units), these units are emerging as an attractive computing platform not only for traditional graphics computation but also for general-purpose computation. In this paper, to study the performance of programmable GPUs, we describe the design and implementation of LU decomposition as an example of numerical computation. To achieve this, we have developed and evaluated some methods with different implementation approaches in terms of (a) loop processing, (b) branch processing, and (c) vector processing. The experimental results give four important points: (1) dependent loops must be implemented through the use of a render texture in order to avoid copies in the video random access memory (VRAM); (2) in most cases, branch processing can be efficiently handled by the CPU rather than the GPU; (3) as Fatahalian et al. state for matrix multiplication, we find that GPUs require higher VRAM cache bandwidth in order to provide fu...
Fumihiko Ino, Manabu Matsui, Keigo Goda, Kenichi H
Added 27 Jun 2010
Updated 27 Jun 2010
Type Conference
Year 2005
Where HIPC
Authors Fumihiko Ino, Manabu Matsui, Keigo Goda, Kenichi Hagihara
Comments (0)