We present a fine-grain dynamic instruction placement algorithm for small L0 scratch-pad memories (spms), whose unit of transfer can be an individual instruction. Our algorithm captures a large fraction of instruction reuse missed by coarse-grain placement algorithms whose unit of transfer is restricted to loops or functions within the capacity of spms. Evaluation of L0 spms with our fine-grain algorithm in 17 applications shows that the energy consumed by instruction storage hierarchy is reduced by 38% and 31% compared to that of L0 instruction caches and L0 spms with an ideal coarse-grain algorithm, respectively. Categories and Subject Descriptors D.3.4 [Programming Languages]: Processors--Compilers, Code Generation, Optimization General Terms Algorithms, Experimentation, Performance Keywords Code Placement, Compilers, Embedded Systems, ScratchPad Memory
JongSoo Park, James D. Balfour, William J. Dally