Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip

14 years 6 months ago

Download www.capsl.udel.edu

This paper presents our experience mapping OpenMP parallel programming model to the IBM Cyclops-64 (C64) architecture. The C64 employs a many-core-on-a-chip design that integrates processing logic (160 thread units), embedded memory (5MB) and communication hardware on the same die. Such a unique architecture presents new opportunities for optimization. Specifically, we consider the following three areas: (1) a memory aware runtime library that places frequently used data structures in scratchpad memory; (2) a unique spin lock algorithm for shared memory synchronization based on in-memory atomic instructions and native support for thread level execution; (3) a fast barrier that directly uses C64 hardware support for collective synchronization. All three optimizations together, result in an 80% overhead reduction for language constructs in OpenMP. We believe that such a drastic reduction in the cost of managing parallelism makes OpenMP more amenable for writing parallel programs on the ...

Juan del Cuvillo, Weirong Zhu, Guang R. Gao

Real-time Traffic

Applied Computing | CF 2006 | Memory Aware Runtime | OpenMP Parallel Programming | Shared Memory Synchronization |

claim paper

Post Info
More Details (n/a)

Added	20 Aug 2010
Updated	20 Aug 2010
Type	Conference
Year	2006
Where	CF
Authors	Juan del Cuvillo, Weirong Zhu, Guang R. Gao

Comments (0)

Sciweavers

Landing openMP on cyclops-64: an efficient mapping of openMP to a many-core system-on-a-chip

Applied Computing | CF 2006 | Memory Aware Runtime | OpenMP Parallel Programming | Shared Memory Synchronization |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers