This paper proposes a novel way to use virtual memorymapped communication (VMMC) to reduce the failover time on clusters. With the VMMC model, applications’ virtual address spac...
Researchers have recently proposed coupling small- to mediumscale multiprocessors to build large-scale shared memory machines, known as multigrain shared memory systems. Multigrai...
A cache line size has a signi cant e ect on missrate and memorytra c. Today's computers use a xed line size, typically 32B, which may not be optimalfor a given application. O...
Alexander V. Veidenbaum, Weiyu Tang, Rajesh K. Gup...
The Virtual Interface (VI) Architecture provides protected userlevel communication with high delivered bandwidth and low permessage latency, particularly for small messages. The V...
As the gap between memory and processor speeds continues to widen, cache efficiency is an increasingly important component of processor performance. Compiler techniques have been...
This paper presents an investigation into local mechanisms and scheduling policies that allow guest processes to efficiently exploit otherwise-idle workstation resources. Unlike t...
Kyung Dong Ryu, Jeffrey K. Hollingsworth, Peter J....
We introduce dependence-based pre-computation as a complement to history-based target prediction schemes. We present pre-computation in the context of virtual function calls (v-ca...
This paper presents a technique, adaptive replication, for automatically eliminating synchronization bottlenecks in multithreaded programs that perform atomic operations on object...
Memory operations remain a significant bottleneck in dynamically scheduled pipelined processors, due in part to the inability to statically determine the existence of memory addr...
Glenn Reinman, Brad Calder, Dean M. Tullsen, Gary ...
—This paper explores the use of compiler optimizations which optimize the layout of instructions in memory. The target is to enable the code to make better use of the underlying ...