Although directory-based cache coherence protocols are the best choice when designing chip multiprocessor architectures (CMPs) with tens of processor cores on chip, the memory overhead introduced by the directory structure may not scale gracefully with the number of cores. In this work, we show that a directory organization based on duplicating tags, which are distributed among the tiles of a tiled CMP with a fine-grained interleaving, is scalable. That is to say, the size of each directory bank is independent on the number of tiles of the system. Moreover, based on this directory organization we propose and evaluate the implicit replacements mechanism which leads to savings of up to 32% in terms of number of messages in the interconnection network.
Alberto Ros, Manuel E. Acacio, José M. Garc