As high-end computing systems continue to grow in scale, recent advances in multiand many-core architectures have pushed such growth toward more denser architectures, that is, mor...
Pavan Balaji, Darius Buntinas, David Goodell, Will...
The number of multithreaded Message Passing Interface (MPI) implementations and applications is increasing rapidly. We discuss how multithreaded applications can receive messages o...
Torsten Hoefler, Greg Bronevetsky, Brian Barrett, ...
Latency tolerance is essential in achieving high performance on parallel computers for remote function calls and fine-grained remote memory accesses. EM-X supports interprocessor ...
In this paper, we present hthreads, a unifying programming model for specifying application threads running within a hybrid CPU/FPGA system. Threads are specified from a single p...
Erik Anderson, Jason Agron, Wesley Peck, Jim Steve...
Memory may be the only system component that is more commoditized than a microprocessor. To simultaneously exploit this and address the impending memory wall, processing in memory...
Arun Rodrigues, Richard C. Murphy, Peter M. Kogge,...