Sciweavers

CF
2009
ACM

Mapping the LU decomposition on a many-core architecture: challenges and solutions

14 years 6 months ago
Mapping the LU decomposition on a many-core architecture: challenges and solutions
Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedded memory. An open question is what programming and compiling methods are effective to exploit the performance potential of this new class of architectures. Using the LU decomposition as a case study, we propose three techniques that combined achieve a 27 times speedup on the IBM Cyclops-64 many-core architecture, compared to the parallel LU implementation from the SPLASH-2 benchmarks suite. Our first method allows adaptive load distribution, which maximizes load-balance among cores – this is important to leverage the potential of the next two methods. Secondly, we developed a method for register tiling that determines the optimal data tile parameters and maximizes data reuse according to register file size constraints. We demonstrate that our method is inherently general and that it should have a much br...
Ioannis E. Venetis, Guang R. Gao
Added 28 May 2010
Updated 28 May 2010
Type Conference
Year 2009
Where CF
Authors Ioannis E. Venetis, Guang R. Gao
Comments (0)