Exploiting locality at run-time is a complementary approach to a compiler approach for those applications with dynamic memory access patterns. This paper proposes a memory-layout oriented approach to exploit cache locality for parallel loops at run-time on Symmetric Multi-Processor (SMP) systems. Guided by applicationdependent hints and the targeted cache architecture, it reorganizes and partitions a parallel loop through shrinking and partitioning the memory-access space of the loop at run-time. In the generated task partitions, the data sharing among partitions is minimized and the data reuse in a partition is maximized. The execution of tasks in partitions is scheduled in an adaptive and locality-preserved way to achieve balanced execution, for minimizing the execution time of applications by trading off load balance and locality. Based on simulation and measurement, we show our run-time approach can achieve comparable performance with the compiler optimizations for two application...