There are two important hurdles that restrict the scalability of directory-based shared-memory multiprocessors: the directory memory overhead and the long L2 miss latencies due to the indirection introduced by the accesses to directory information, usually stored in main memory. This work presents a lightweight directory architecture aimed at facing these two important problems. Our proposal takes advantage of the temporal locality exhibited by the accesses to the directory information and on-chip integration to design a directory protocol with the best characteristics of snoopy protocols. The lightweight directory architecture removes the directory structure from main memory and it stores directory information in the L2 cache avoiding in most cases the access to main memory. The proposed architecture is evaluated based on extensive execution-driven simulations of a 32-node cc-NUMA multiprocessor. Results demonstrate that the lightweight directory architecture achieves better performan...
Alberto Ros, Manuel E. Acacio, José M. Garc