With shrinking feature size of silicon fabrication technology, architects are putting more and more logic into a single die. While one might opt to use these transistors for build...
Lin Li, Narayanan Vijaykrishnan, Mahmut T. Kandemi...
Parallel programs that modify shared data in a cachecoherent multiprocessor with a write-invalidate coherence protocol create ownership overhead in the form of ownership acquisiti...
Thread-level speculation is a technique that enables parallel execution of sequential applications on a multiprocessor. This paper describes the complete implementation of the sup...
This paper describes the synchronization and communication primitives of the Cray T3E multiprocessor, a shared memory system scalable to 2048 processors. We discuss what we have l...
Recent advances in Field-Programmable Gate Arrays (FPGA) and programmable interconnects have made it possible to build efficient hardware emulation engines. In addition, improveme...