Memory latency tolerant architectures support thousands of in-flight instructions without scaling cyclecritical processor resources, and thousands of useful instructions can compl...
Amit Gandhi, Haitham Akkary, Ravi Rajwar, Srikanth...
Software fine-grain distributed shared memory (FGDSM) provides a simplified shared-memory programming interface with minimal or no hardware support. Originally software FGDSMs tar...
Ioannis Schoinas, Babak Falsafi, Mark D. Hill, Jam...
In this paper, we consider the tree task graphs which arise from many important programming paradigms such as divide and conquer, branch and bound etc., and the linear task-graphs...
We describe the Slice Processor micro-architecture that implements a generalized operation-based prefetching mechanism. Operation-based prefetchers predict the series of operation...
Andreas Moshovos, Dionisios N. Pnevmatikatos, Amir...
For a while it seemed possible to pretend that all interaction between an algorithm and its environment occurs inter-step, but not anymore. Andreas Blass, Benjamin Rossman and the ...