In this paper, we propose a novel hardware caching technique, called switch directory, to reduce the communication latency in CC-NUMA multiprocessors. The main idea is to implement small fast directory caches in crossbar switches of the interconnect medium to captureand store ownershipinformation as the data flows from the memory module to the requesting processor. Using the stored information, the switch directory re-routes subsequent requests to dirty blocks directly to the owner cache, thus reducing the latency for home node processing such as slow DRAM directory access and coherence controller occupancies. The design and implementation details of a DiRectory Embedded Switch ARchitecture, DRESAR, are presented. We explore the performance benefits of switch directories by modeling DRESAR in a detailed execution driven simulator. Our results show that the switch directories can improve performance by up to 60% reduction in home node cache-to-cache transfers for several scientific ...
Ravi R. Iyer, Laxmi N. Bhuyan, Ashwini K. Nanda