Effective use of the memory hierarchy is critical for achieving high performance on embedded systems. We focus on the class of streaming applications, which is increasingly prevalent in the embedded domain. We exploit the widespread parallelism and regular communication patterns in stream programs to formulate a set of cache aware optimizations that automatically improve instruction and data locality. Our work is in the context of the Synchronous Dataflow model, in which a program is described as a graph of independent actors that communicate over channels. The communication rates between actors are known at compile time, allowing the compiler to statically model the caching behavior. We present three cache aware optimizations: 1) execution scaling, which judiciously repeats actor executions to improve instruction locality, 2) cache aware fusion, which combines adjacent actors while respecting instruction cache constraints, and 3) scalar replacement, which converts certain data buffe...
Janis Sermulins, William Thies, Rodric M. Rabbah,