—Using the well-known ATLAS and LAPACK dense linear algebra libraries, we demonstrate that the parallel management overhead (PMO) can grow with problem size on even statically sc...
This paper presents the study and results of running several core multimedia applications on a simultaneous multithreading (SMT) architecture, including some detailed analysis ran...
Yen-Kuang Chen, Eric Debes, Rainer Lienhart, Matth...
Efficient performance tuning of parallel programs is often hard. We present a performance prediction and visualization tool called VPPB. Based on a monitored uni-processor executi...
As we look to the future, and the prospect of a billion transistors on a chip, it seems inevitable that microprocessors will exploit having multiple parallel threads. To achieve t...
Abstract. Shared last level cache is crucial to performance. However, multithread program model incurs serious contention in shared cache. In this paper, to reduce average cache ac...