The cache hierarchy design in existing SMT and superscalar processors is optimized for latency, but not for bandwidth. The size of the L1 data cache did not scale over the past de...
We investigate the implementation of IP look-up for core routers using multiple microengines and a tailored memory hierarchy. The main architectural concerns are limiting the numb...
Jean-Loup Baer, Douglas Low, Patrick Crowley, Neal...
Data-parallel programs are both growing in importance and increasing in diversity, resulting in specialized processors targeted at specific classes of these programs. This paper ...
Karthikeyan Sankaralingam, Stephen W. Keckler, Wil...
Multiple memory banks design is employed in many high performance DSP processors. This architectural feature supports higher memory bandwidth by allowing multiple data memory acce...
Chun Jason Xue, Tiantian Liu, Zili Shao, Jingtong ...
Memory subsystems have been considered as one of the most critical components in embedded systems and furthermore, displaying increasing complexity as application requirements div...