Mapping the LU decomposition on a many-core architecture: challenges and solutions

14 years 7 months ago

Download www.capsl.udel.edu

Recently, multi-core architectures with alternative memory subsystem designs have emerged. Instead of using hardwaremanaged cache hierarchies, they employ software-managed embedded memory. An open question is what programming and compiling methods are eﬀective to exploit the performance potential of this new class of architectures. Using the LU decomposition as a case study, we propose three techniques that combined achieve a 27 times speedup on the IBM Cyclops-64 many-core architecture, compared to the parallel LU implementation from the SPLASH-2 benchmarks suite. Our ﬁrst method allows adaptive load distribution, which maximizes load-balance among cores – this is important to leverage the potential of the next two methods. Secondly, we developed a method for register tiling that determines the optimal data tile parameters and maximizes data reuse according to register ﬁle size constraints. We demonstrate that our method is inherently general and that it should have a much br...

Ioannis E. Venetis, Guang R. Gao

Real-time Traffic

Alternative Memory Subsystem | Applied Computing | CF 2009 | Register Allocation Method | Software-managed Embedded Memory |

claim paper

Post Info
More Details (n/a)

Added	28 May 2010
Updated	28 May 2010
Type	Conference
Year	2009
Where	CF
Authors	Ioannis E. Venetis, Guang R. Gao

Comments (0)

Sciweavers

Mapping the LU decomposition on a many-core architecture: challenges and solutions

Alternative Memory Subsystem | Applied Computing | CF 2009 | Register Allocation Method | Software-managed Embedded Memory |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers