Memory-processor integration o ers new opportunities for reducing the energy of a system. In the case of embedded systems, one solution consists of mapping the most frequently accessed addresses onto the on-chip SRAM to guarantee power and performance e ciency. This option is especially e ective when memory access patterns can be pro led and studied at design time (as in typical real-time embedded systems). In this work, we propose an algorithm for the automatic partitioning of on-chip SRAM in multiple banks that can be independently accessed. Starting from the dynamic execution pro le of an embedded application running on a given processor core, we synthesize a multi-banked SRAM architecture optimally tted to the execution pro le. The algorithm provides a globally optimum solution to the problem under realistic assumptions on the power cost metrics, and with constraints on the number of memory banks. Results, collected on a set of embedded applications for the ARM processor, have sho...