Efficiently utilizing off-chip DRAM bandwidth is a critical issue in designing cost-effective, high-performance chip multiprocessors (CMPs). Conventional memory controllers deliver relatively low performance in part because they often employ fixed, rigid access scheduling policies designed for average-case application behavior. As a result, they cannot learn and optimize the long-term performance impact of their scheduling decisions, and cannot adapt their scheduling policies to dynamic workload behavior. We propose a new, self-optimizing memory controller design that operates using the principles of reinforcement learning (RL) to overcome these limitations. Our RL-based memory controller observes the system state and estimates the long-term performance impact of each action it can take. In this way, the controller learns to optimize its scheduling policy on the fly to maximize long-term performance. Our results show that an RL-based memory controller improves the performance of a...
Engin Ipek, Onur Mutlu, José F. Martí