The growth in complexity of modern systems makes it increasingly difficult to extract high-performance. The software stacks for such systems typically consist of multiple layers a...
We propose an organization for the on-chip memory system of a chip multiprocessor, in which 16 processors share a 16MB pool of 256 L2 cache banks. The L2 cache is organized as a n...
Jaehyuk Huh, Changkyu Kim, Hazim Shafi, Lixin Zhan...
For high-data-rate applications, turbo-like iterative decoders are implemented with parallel hardware architecture. However, to achieve high throughput, concurrent accesses to each...
Awais Sani, Philippe Coussy, Cyrille Chavet, Eric ...
Abstract. Prefetching transfers a data item in advance from its storage location to its usage location so that communication is hidden and does not delay computation. We present a ...
Michael Klemm, Jean Christophe Beyler, Ronny T. La...
Recently dual-port SDRAM (DPSDRAM) architecture tailored for dual-processor based mobile embedded systems has been announced where a single memory chip plays the role of the local...
Hoeseok Yang, Sungchan Kim, Hae-woo Park, Jinwoo K...