The use of large instruction windows coupled with aggressive out-oforder and prefetching capabilities has provided significant improvements in processor performance. In this paper...
The growing processor/memory performance gap causes the performance of many codes to be limited by memory accesses. If known to exist in an application, strided memory accesses fo...
Tushar Mohan, Bronis R. de Supinski, Sally A. McKe...
TreadMarks is a distributed shared memory DSM system for standard Unix systems such as SunOS and Ultrix. This paper presents a performance evaluation of TreadMarks running on Ultr...
Peter J. Keleher, Alan L. Cox, Sandhya Dwarkadas, ...
There is building interest in using FPGAs as accelerators for high-performance computing, but existing systems for programming them are so far inadequate. In this paper we propose...
Conventional programming models were designed to be used by expert programmers for programming for largescale multiprocessors, distributed computational clusters, or specialized p...