Energy saving is becoming one of the major design issues in processor architectures with multiple functional units (FUs). Nested loops are usually the most critical part in multimedia and high-performance DSP systems. There is a tradeoff between power saving and performance, such as timing constraint and code size requirement, of nested loops. This paper studies how to minimize the total energy while satisfying performance requirement for applications with multidimensional nested loops. An algorithm, energy minimization with loop fusion and FU schedule (EMLFS), is proposed. We first use retiming and partition to fuse nested loops. Then we use novel FU scheduling algorithms to maximize energy saving without sacrificing performance. The experimental results show that the average improvement on energy saving is significant by using our EMLFS algorithm.