In recent years, the increasing number of processor cores and limited increases in main memory bandwidth have led to the problem of the bandwidth wall, where memory bandwidth is becoming a performance bottleneck. This is especially true for emerging latency-insensitive, bandwidth-sensitive applications. Designing the memory hierarchy for a platform with an emphasis on maximizing bandwidth within a fixed power budget becomes one of the key challenges. To facilitate architects to quickly explore the design space of memory hierarchies, we propose an analytical performance model called Moguls. The Moguls model estimates the performance of an application on a system, using the bandwidth demand of the application for a range of cache capacities and the bandwidth provided by the system with those capacities. We show how to extend this model with appropriate approximations to optimize a cache hierarchy under a power constraint. The results show how many levels of cache should be designed, an...
Guangyu Sun, Christopher J. Hughes, Changkyu Kim,