Stencil-based kernels constitute the core of many scientific applications on block-structured grids. Unfortunately, these codes achieve a low fraction of peak performance, due pr...
Shoaib Kamil, Kaushik Datta, Samuel Williams, Leon...
In this paper we present an exhaustive evaluation of the memory subsystem in a chip-multiprocessor (CMP) architecture composed of 16 cores. The characterization is performed making...
This paper presents the cache configuration exploration of a programmable system, in order to find the best matching between the architecture and a given application. Here, prog...
Pablo Viana, Edna Barros, Sandro Rigo, Rodolfo Aze...
Current high performance computer systems use complex, large superscalar CPUs that interface to the main memory through a hierarchy of caches and interconnect systems. These CPU-c...
The trend of increasing speed and complexity in the single-core processor as stated in the Moore’s law is facing practical challenges. As a result, the multi-core processor arch...
Abdullah Kayi, Yiyi Yao, Tarek A. El-Ghazawi, Greg...