Future SoCs will contain multiple cores. For workloads with significant parallelism, prior work has shown the benefit of many small, multi-threaded, scalar cores. For workloads th...
Periscope is a distributed automatic online performance analysis system for large scale parallel systems. It consists of a set of analysis agents distributed on the parallel machin...
This work presents the first extensive study of singlenode performance optimization, tuning, and analysis of the fast multipole method (FMM) on modern multicore systems. We consid...
Aparna Chandramowlishwaran, Samuel Williams, Leoni...
In this paper we explore the impact of the block shape on blocked and vectorized versions of the Sparse Matrix-Vector Multiplication (SpMV) kernel and build upon previous work by ...
Vasileios Karakasis, Georgios I. Goumas, Nectarios...
Simultaneous multithreading (SMT) and chip multiprocessing (CMP) both allow a chip to achieve greater throughput, but their relative energy-efficiency and thermal properties are s...
Yingmin Li, David Brooks, Zhigang Hu, Kevin Skadro...