Advances in semiconductor technologies have placed MPSoCs center stage as a standard architecture for embedded applications of ever increasing complexity. Efficient utilization of the ample hardware resources requires applications to be decomposed into fine-grained threads, engendering in turn a large amount of interprocessor communications. While fine-grained on-chip interconnects can reduce the data transfer overhead, the traditional synchronization mechanisms, such as spin locks and barriers, still cause significant contention in polling shared variables. To overcome this issue, in this paper we propose a light-weight distributed synchronization mechanism which statically encodes the semantically correct order of accesses to each shared variable. A sharp reduction in the number of code bits is attained through a reference coloring algorithm, which furthermore enables an implementation within negligible hardware overhead. This light-weight synchronization mechanism allows dependent ...