This paper presents a framework for analyzing the performance of multithreaded programs using a model called a constraint graph. We review previous constraint graph definitions fo...
Shared memory multiprocessor systems typically provide a set of hardware primitives in order to support synchronization. Generally, they provide single-word read-modify-write hard...
Graphics and media processing is quickly emerging to become one of the key computing workloads. Programmable graphics processors give designers extra flexibility by running a sma...
Mauricio Breternitz Jr., Herbert H. J. Hum, Sanjee...
In Thread-Level Speculation (TLS), speculative tasks generate memory state that cannot simply be combined with the rest of the system because it is unsafe. One way to deal with th...
We investigate the implementation of IP look-up for core routers using multiple microengines and a tailored memory hierarchy. The main architectural concerns are limiting the numb...
Jean-Loup Baer, Douglas Low, Patrick Crowley, Neal...
This paper describes Compiler-Directed Content-Aware Prefetching (CDCAP), an integrated compiler and hardware approach for prefetching dynamic data structures. The approach utiliz...
Computers have changed the way we live, work, and even recreate. Now, they are transforming how we think about and treat human disease. In particular, advanced techniques in biome...