To fully exploit multicore processors, applications are expected to provide a large degree of thread-level parallelism. While adequate for low core counts and their typical worklo...
Abstract. We present PerfMiner, a system for the transparent collection, storage and presentation of thread-level hardware performance data across an entire cluster. Every sub-proc...
Philip Mucci, Daniel Ahlin, Johan Danielsson, Per ...
The access patterns performed by disk-intensive applications vary widely, from simple contiguous reads or writes through an entire file to completely unpredictable random access....
Dennis Dalessandro, Ananth Devulapalli, Pete Wycko...
We present compiler techniques for translating OpenMP shared-memory parallel applications into MPI messagepassing programs for execution on distributed memory systems. This transl...
This paper presents the first analysis of operating system execution on a simultaneous multithreaded (SMT) processor. While SMT has been studied extensively over the past 6 years,...