Some of the most common parallel programming idioms include locks, barriers, and reduction operations. The interaction of these programming idioms with the multiprocessor's coherence protocol has a signi cant impact on performance. In addition, the advent of machines that support multiple coherence protocols prompts the question of how to best implement such parallel constructs, i.e. what combination of implementation and coherence protocol yields the best performance. In this paper we study the running time and communication behavior of (1) centralized (ticket) and MCS spin locks, (2) centralized, dissemination, and treebased barriers, and (3) parallel and sequential reductions, under pure and competitive update coherence protocols; results for write-invalidate protocol are presented mostly for comparison purposes. Our experiments indicate that parallel programming techniques that are well-established for write invalidate protocols, such as MCS locks and parallel reductions, are...
Ricardo Bianchini, Enrique V. Carrera, Leonidas I.