Modern systems are able to put two or more processors on the same die (Chip Multiprocessors, CMP), each with its private caches, while the last level caches can be either private or shared. As these systems are affected by the wire delay problem, NUCA caches have been proposed to hide the effects of such delay in order to increase performance. A CMP system that adopt a NUCA as its shared last level cache has to be able to maintain coherence among the lowest, private levels of the cache hierarchy. As NUCA caches typically adopt a NoC as the communication infrastructure (in which the communication paradigm is message-passing), the coherence protocol has to be directory based, similar to the ones proposed for classical DSM systems. Previous works focusing on NUCA-based CMP systems adopt a fixed topology (i.e. physical position of cores and NUCA banks, and the communication infrastructure) each adopting different coherence strategies. In this paper, we present an evaluation of an 8-cpu CM...