As shared-memory multiprocessors become the dominant commodity source of computation, parallelizing compilers must support mainstream computations that manipulate irregular, pointer-based data structures such as lists, trees and graphs. Our experience with a parallelizing compiler for this class of applications shows that their synchronization requirements differ significantly from those of traditional parallel computations. Instead of coarse-grain barrier synchronization, irregular computations require synchronization primitives that support efficient fine-grain atomic operations. The standard implementation mechanism for atomic operations uses mutual exclusion locks. But the overhead of acquiring and releasing locks can reduce the performance. Locks can also consume significant amounts of memory. Optimisitic synchronization primitives such as load linked/store conditional are an attractive alternative. They require no additional memory and eliminate the use of heavyweight blocki...
Martin C. Rinard