Cray X1 Fortran and C/C++ compilers provide a number of loop transformations, notably vectorization and multistreaming, in order to exploit the multistreaming processor (MSP) hard...
We describe the design and implementation of Dynamo, a software dynamic optimization system that is capable of transparently improving the performance of a native instruction stre...
Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia
Current shared memory multicore and multiprocessor systems are nondeterministic. Each time these systems execute a multithreaded application, even if supplied with the same input,...
Joseph Devietti, Brandon Lucia, Luis Ceze, Mark Os...
Abstract--Multi-core processors with accelerators are becoming commodity components for high-performance computing at scale. While accelerator-based processors have been studied in...
M. Mustafa Rafique, Ali Raza Butt, Dimitrios S. Ni...
Advances in VLSI technology will enable chips with over a billion transistors within the next decade. Unfortunately, the centralized-resource architectures of modern microprocesso...
Walter Lee, Rajeev Barua, Matthew Frank, Devabhakt...