Memory systems for conventional large-scale computers provide only limited bytes/s of data bandwidth when compared to their flop/s of instruction execution rate. The resulting bottleneck limits the bytes/flop that a processor may access from the full memory footprint of a machine and can hinder overall performance. This paper discusses physical and functional views of memory hierarchies and examines existing ratios of bandwidth to execution rate versus memory capacity (or bytes/flop versus capacity) found in a number of large-scale computers. The paper then explores a set of technologies, Proximity Communication, low-power on-chip networks, dense optical communication, and Sea-of-Anything interconnect, that can flatten this bandwidth hierarchy to relieve the memory bottleneck in a large-scale computer that we call “Hero.”
Robert J. Drost, Craig Forrest, Bruce Guenin, Ron