In distributed shared memory multiprocessors, remote memory references generate processor-to-memory traffic, which may result in a bottleneck. It is therefore important to design algorithms that minimize the number of remote memory references. We establish a lower bound of three on remote reference time complexity for mutual exclusion algorithms in a model where processes communicate by means of a general read-modify-write primitive that accesses at most one shared variable in one instruction. Since the general read-modify-write primitive is a generalization of a variety of atomic primitives that have been implemented in multiprocessor systems, our lower bound holds for all mutual exclusion algorithms that use such primitives. Furthermore, this lower bound is shown to be tight by presenting an algorithm with the matching upper bound.