Embedded system programs tend to spend much time in small loops. Introducing a very small loop cache into the instruction memory hierarchy has thus been shown to substantially reduce instruction fetch energy. However, loop caches come in many sizes and variations – using the configuration best on the average may actually result in worsened energy for a specific program. We therefore introduce a loop cache exploration tool that analyzes a particular program’s profile, rapidly explores the possible configurations, and generates the configuration with the greatest power savings. We introduce a simulationbased approach and show the good energy savings that a customized loop cache yields. We also introduce a fast estimation-based approach that obtains nearly the same results in seconds rather than tens of minutes or hours. Keywords Low power, low energy, tuning, loop cache, embedded systems, instruction fetching, customized architectures, memory hierarchy, estimation, synthesis.