In this paper we present a framework for a distributed and very low-cost implementation of synchronization controllers and protocols for embedded multiprocessors. The proposed architecture effectively implements the queued-lock semantics in a completely distributed way. The proposed approach to synchronization implementation not only completely eliminates the overwhelming bus contention traffic when multiple cores compete for a synchronization variable, but also achieves very high energy efficiency as the local synchronization controller can efficiently determine, without any bus transactions or local cache spinning, the exact timing of when the lock is made available to the local processor. Applicationspecific information regarding synchronization variables in the local task is exploited in implementing the distributed synchronization protocol. The local synchronization controllers enable the system software or the thread library to implement various low-power policies, such as d...