In this paper, we describe a generalized approach to deriving a custom data layout in multiple memory banks for array-based computations, to facilitate high-bandwidth parallel mem...
The performance of future manycore processors will only scale with the number of integrated cores if there is a corresponding increase in memory bandwidth. Projected scaling of el...
Scott Beamer, Chen Sun, Yong-Jin Kwon, Ajay Joshi,...
Recently-proposed processor microarchitectures for high Memory Level Parallelism (MLP) promise substantial performance gains. Unfortunately, current cache hierarchies have Miss-Ha...
In this paper we address the design of a future high-speed router that supports line rates as high as OC-3072 (160 Gb/s), around one hundred ports and several service classes. Bui...
In many-core architectures, memory blocks are commonly assigned to the banks of a NUCA cache by following a physical mapping. This mapping assigns blocks to cache banks in a round-...
Alberto Ros, Marcelo Cintra, Manuel E. Acacio, Jos...