This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (Network-on-Chip), targeted at future powerefficient systems. The proposed solution is based on the idea of locally performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks). We introduce a HW module, the Synchronization-operation Buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 30% energy saving with respect to synchronization based on directory-based coherence protocol.