Abstract—The Active Memory Cube (AMC) is a novel nearmemory processor that exploits high memory bandwidth and low latency close to DRAM to execute scientific applications in an energy-efficient manner. Its energy efficiency is derived from a combination of its novel scalar-vector data-flow path combined with its simple control-flow path that required the development of a sophisticated compiler, co-designed with the architecture. Such co-design is commonly done using hand-tuned codes for simple kernels that typically do not capture the nuances of realworld applications or reveal the complexities of programming a heterogeneous system. At the same time, an entire application is intractable to an early-stage compiler. In this work we describe a progressive, iterative methodology to the co-design of the compiler and architecture for the AMC using LULESH, a real-world hydrodynamics proxy application. We focus on a procedure that calculates the kinematic variables for domain elements. ...
Arpith C. Jacob, Ravi Nair, Tong Chen, Zehra Sura,