Applications running on the StrongARM SA-1110 or XScale processor cores can specify cache mapping for each virtual page to achieve better cache utilization. In this work, we describe a method to efficiently perform cache mapping. Under this scheme, we select a number of loops for sampling. These loops are selected automatically based on clock profiling information. We formulate the optimal cache mapping problem as an Integer Linear Programming (ILP) problem. Experiments performed on 14 test programs show speedups in 13 of them (over the default mapping) after applying our sample-based cache mapping scheme. The geometric mean of