This paper proposes a non-uniform cache architecture for reducing the power consumption of memory systems. The nonuniform cache allows having different associativity values (i.e., the number of cache-ways) for different cache-sets. An algorithm determines the optimum number of cache-ways for each cache-set and generates object code suitable for the non-uniform cache memory. The paper also proposes a compiler technique for reducing redundant cache-way accesses and cache-tag accesses. Experiments demonstrate that our technique can reduce the power consumption of memory systems by up to 76% compared to the best result achieved by the conventional method. Categories and Subject Descriptors C.3 [Special-Purpose and Application-Based Systems]: Microprocessor/microcomputer applications General Terms Algorithms, Performance, Design. Keywords Microprocessor, Cache Memory, Compiler, Embedded Systems