We present an extension to an existing SPARC V8 instruction set simulator, SimICS, to support accurate profiling of branches and instruction cache misses. SimICS had previously su...
Software DSM systems su er from the high communication and coherence-induced overheads that limit performance. This paper introduces our e orts in reducing system overheads of a h...
Abstract—Chip multiprocessors (CMPs) present a unique scenario for software data prefetching with subtle tradeoffs between memory bandwidth and performance. In a shared L2 based ...
Transactional Memory (TM) provides mechanisms that promise to simplify parallel programming by eliminating the need for locks and their associated problems (deadlock, livelock, pr...
Hassan Chafi, Jared Casper, Brian D. Carlstrom, Au...
Run-time compilation systems are challenged with the task of translating a program’s instruction stream while maintaining low overhead. While software managed code caches are ut...
Vijay Janapa Reddi, Dan Connors, Robert Cohn, Mich...