In chip multiprocessors (CMPs), data accesslatency dependson the memory hierarchy organization, the on-chip interconnect (NoC), and the running workload. Reducing data access latency is vital to achieving performance improvements and scalability of threaded applications. Multithreaded applications generally exhibit sharing of data among the program threads, which generates coherenceand data traffic on the NoC. Many NoC designs exploit communication locality to reduce communication latency by configuring special fast paths on which communication is faster than the rest of the NoC. Communication patterns are directly affected by the cache organization. However, many cache organizations are designed in isolation of the underlying NoC or assume a simple NoC design, thus possibly missing optimization opportunities. In this work, we present a NoCaware cache design that creates a symbiotic relationship between the NoC and cache to reduce data access latency, improve utilization of cache ca...
Ahmed Abousamra, Alex K. Jones, Rami G. Melhem