This paper presents a new methodology for implementing fast synchronization on scalable cache-coherent multiprocessors, through the use of hybrid primitives. Hybrid primitives leverage commodity hardware to speed-up the execution of the atomic remote Read-Modify-Write (RMW) instructions employed in synchronization algorithms to resolve contending processors, while exploiting the caches to reduce network traffic during the waiting and release phases of a synchronization primitive. We present a systematic methodology for transforming any synchronization primitive that uses RMW instructions into a hybrid one. We then provide experimental evidence on the effectiveness of using hybrid primitives in the implementation of spin locks, barriers and lock-free queues, in microbenchmarks and parallel applications on a SGI Origin2000.
Dimitrios S. Nikolopoulos, Theodore S. Papatheodor