Recently, there is a surge of interests in using FPGAs for computer architecture research including applications from emulating and analyzing a new platform to accelerating microarchitecural simulation speed for design space exploration. This paper proposes and demonstrates a novel usage of FPGAs for measuring the efficiency of coherent traffic of an actual computer system. Our approach employs an FPGA acting as a bus agent, interacting with a real CPU in a dual processor system to measure the intrinsic delay of coherence traffic. This technique eliminates non-deterministic factors in the measurement, such as the arbitration delay and stall in the pipelined bus. It completely isolates the impact of pure coherence traffic delay on system performance while executing workloads natively. Our experiments show that the overall execution time of the benchmark programs on a system with coherence traffic was actually increased over one without coherent traffic. It indicates that cacheto-cache ...
Taeweon Suh, Shih-Lien Lu, Hsien-Hsin S. Lee