In this paper, we present several novel strategies to improve software controlled cache utilization, so as to achieve lower power requirements for multi-media and signal processing applications. Our methodology is targeted towards embedded multi-media and DSP processors. This methodology takes into account many program parameters like the locality of data, size of data structures, access structures of large array variables, regularity of loop nests and the size and type of cache with the objective of improving the cache performance for lower power. We also take into account the potential overhead due to the different transformations on the instruction count and the number of execution cycles to meet the real time constraints and code size limitations. Experiments on a real life demonstrator illustrate the fact that our methodology is able to achieve significant gain in power requirements while meeting all other system constraints.